https://bugs.documentfoundation.org/show_bug.cgi?id=32249

--- Comment #35 from Eyal Rozenberg <[email protected]> ---
(In reply to V Stuart Foote from comment #34)
> Nope! It again illustrates the bottom line that PDF (ISO 32000-1:2008, or
> 2:2020) is NOT an editable format, it is a presentation/publication format. 

People need to edit PDFs all the time - hence its featuring prominently in a
video describing common tasks which need catering to by desktop apps. You get a
PDF of a form - typically scanned or printed from a word processor - and you
need to put text and/or a signature on it. That's PDF editing, and millions of
people do it every day. Ok, maybe not millions every day, let's say millions
every week.

> Also, it demonstrates reality that LibreOffice is not a PDF editor as we
> will only ever read content of a PDF to filter import to an ODF XML
> compliant document canvas.

Nobody said LO needs to represent the PDF structure as-is and perform surgical
edits. In that sense, LO isn't a .doc and .docx editor either: It only ever
reads their contents via an import filter; and it is certainly a .doc and .docx
editor. But - we've had this argument already. Why are you repeating a rebutted
point?

> And it highlights the project's need to scrupulously manage user
> expectations reinforcing that PDF is not an editable format, and that
> LibreOffice is NOT a PDF "editor".

You keep saying that, despite it having been demonstrated to you both in
principle and empirically that it is. What LO needs to manage perhaps people's
insistence of sticking their heads in the sand and ignoring an important use of
our suite. I'll bet you there are more people using LO as a PDF editor than
users of LO Base, for example. (No offense to the LO Base folks!)

But anyway, let's focus on the practicality and the scope of this bug.

> Improvements can be made to LO filter handling as a PDF reader to import
> content--witness the adoption of pdfium libs for the insert as image filter
> paths.

That's a step in the right direction - as was the resolution of bug 104597. But
there's a very long way to go.

> But simply put, the internals of the presentation optimized text runs within
> PDF do not support extraction with the lexical syntax of the original source
> document from which a PDF was generated.

That's true, and we can never hope to restore what's not saved in a PDF. But:

1. We can avoid losing the information and styling that _is_ represented in the
PDF, so that importing-then-saving would result in a PDF with no noticeable
distortions, or almost none. At least - for PDFs of typical documents which
don't use the more esoteric features of PDFs. Of course the PDF's internal
structure will likely show a lot more differences, but the observed result will
be pleasing.

2. We can use reasonable assumptions to constitute paragraphs, define styles,
have structural elements/features like columns, tables, annotations, comments,
etc. Yes, each of these is may be a lot of work and nobody expects this to
happen overnight, but if we set this as an explicit goal and have some
development resources assigned to working towards that goal then things will
gradually improve. By the way, this is mostly, even if not entirely, orthogonal
to making sure we don't mess up the PDF on import-then-export.

3. For the specific case of LO being the originator of the PDF, we could
consider - and that is out-of-scope here I suppose - embedding auxiliary
information into the PDF which allows for perfect or near-perfect
reconstitution of the original LO document.


> but there are very real limits to what the project can or should do.

Certainly, but these limits depend in part on what the project defines as a
goal or an important feature. Recognition of the use of LO as a PDF editor
rather than its denial will allow for setting these limits farther.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to