[jira] [Commented] (PDFBOX-5283) No Content - xRef / Obj Parsing

Michael Klink (Jira) Thu, 23 Sep 2021 06:59:05 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419232#comment-17419232
 ]


Michael Klink commented on PDFBOX-5283:
---------------------------------------

{quote}[~msahyoun]> Isn't that what one would expect?
{quote}
Naïve users (who don't care and simply don't want issues) and lazy programmers 
(who don't want to have to explain to support / clients / customers / users) 
may expect this.

But it is bad. PDFBox is a library in particular for automatically creating and 
processing PDFs which is something entirely different than an interactive PDF 
viewer or editor.

If an interactive PDF viewer or editor (like Adobe Acrobat) comes across an 
error in a file it loads and somehow fixes (or mis-fixes) it under the hood, 
then there still is the user to check whether the displayed result makes sense. 
So here one could argue that under-the-hood fixes are a feature.

If an automatic process (e.g. based on PDFBox) comes across an error in a file 
it loads and somehow fixes (or mis-fixes) it under the hood, then there is no 
such check. A mis-fixed input PDF may result in thousands of PDFs with utterly 
broken content to be sent to potential customers who then are less inclined to 
become customers. Or even worse, those PDFs may go into some archive one is 
legally required to put certain documents into, and during the next audit one 
has to provide the documents and finds only garbage...

For that reason alone *PDF libraries for automatic PDF processing must not be 
as lax as interactive PDF processors*.

But there also is a security related aspect: PDF signing and encryption scheme 
nowadays are usually verified to be secure if used in accordance with the PDF 
specification. But if PDF processors apply some repairs under the hood, then 
security of those schemes suddenly is not guaranteed anymore. E.g. applying 
signatures may interfere with such repairs in such a way that the signer in spe 
initially saw something different than the signed file will show.

But I digress...

> No Content - xRef / Obj Parsing
> -------------------------------
>
>                 Key: PDFBOX-5283
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5283
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.24, 3.0.0 PDFBox
>            Reporter: Oliver Schmidtmer
>            Priority: Major
>         Attachments: Lieferschein_110300.pdf
>
>
> There seems to be an issue with xRef / object reading when parsing the 
> attached pdf.
> The PDF itself has for example two objects with the ref "8 0 R":
> One at position 1967 with a "/Content" entry.
> One at position 7782 without a "/Content" entry.
> Both are referenced in the XRef Table, so there seems to be something off. 
> Probably Acrobat, etc. are using the first object, while PDFBox is using the 
> second one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5283) No Content - xRef / Obj Parsing

Reply via email to