https://bugs.documentfoundation.org/show_bug.cgi?id=32249
--- Comment #35 from Eyal Rozenberg <[email protected]> --- (In reply to V Stuart Foote from comment #34) > Nope! It again illustrates the bottom line that PDF (ISO 32000-1:2008, or > 2:2020) is NOT an editable format, it is a presentation/publication format. People need to edit PDFs all the time - hence its featuring prominently in a video describing common tasks which need catering to by desktop apps. You get a PDF of a form - typically scanned or printed from a word processor - and you need to put text and/or a signature on it. That's PDF editing, and millions of people do it every day. Ok, maybe not millions every day, let's say millions every week. > Also, it demonstrates reality that LibreOffice is not a PDF editor as we > will only ever read content of a PDF to filter import to an ODF XML > compliant document canvas. Nobody said LO needs to represent the PDF structure as-is and perform surgical edits. In that sense, LO isn't a .doc and .docx editor either: It only ever reads their contents via an import filter; and it is certainly a .doc and .docx editor. But - we've had this argument already. Why are you repeating a rebutted point? > And it highlights the project's need to scrupulously manage user > expectations reinforcing that PDF is not an editable format, and that > LibreOffice is NOT a PDF "editor". You keep saying that, despite it having been demonstrated to you both in principle and empirically that it is. What LO needs to manage perhaps people's insistence of sticking their heads in the sand and ignoring an important use of our suite. I'll bet you there are more people using LO as a PDF editor than users of LO Base, for example. (No offense to the LO Base folks!) But anyway, let's focus on the practicality and the scope of this bug. > Improvements can be made to LO filter handling as a PDF reader to import > content--witness the adoption of pdfium libs for the insert as image filter > paths. That's a step in the right direction - as was the resolution of bug 104597. But there's a very long way to go. > But simply put, the internals of the presentation optimized text runs within > PDF do not support extraction with the lexical syntax of the original source > document from which a PDF was generated. That's true, and we can never hope to restore what's not saved in a PDF. But: 1. We can avoid losing the information and styling that _is_ represented in the PDF, so that importing-then-saving would result in a PDF with no noticeable distortions, or almost none. At least - for PDFs of typical documents which don't use the more esoteric features of PDFs. Of course the PDF's internal structure will likely show a lot more differences, but the observed result will be pleasing. 2. We can use reasonable assumptions to constitute paragraphs, define styles, have structural elements/features like columns, tables, annotations, comments, etc. Yes, each of these is may be a lot of work and nobody expects this to happen overnight, but if we set this as an explicit goal and have some development resources assigned to working towards that goal then things will gradually improve. By the way, this is mostly, even if not entirely, orthogonal to making sure we don't mess up the PDF on import-then-export. 3. For the specific case of LO being the originator of the PDF, we could consider - and that is out-of-scope here I suppose - embedding auxiliary information into the PDF which allows for perfect or near-perfect reconstitution of the original LO document. > but there are very real limits to what the project can or should do. Certainly, but these limits depend in part on what the project defines as a goal or an important feature. Recognition of the use of LO as a PDF editor rather than its denial will allow for setting these limits farther. -- You are receiving this mail because: You are the assignee for the bug.
