[ 
https://issues.apache.org/jira/browse/PDFBOX-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050661#comment-18050661
 ] 

Michael Klink edited comment on PDFBOX-6142 at 1/8/26 4:53 PM:
---------------------------------------------------------------

{quote}
The length of "buf" isn't same as the length of the data. To prove this, ...
{quote}

Well, not much need to prove this, it's fairly obvious that the internal 
{{ByteArrayOutputStream}} buffer usually is longer than the actual data it 
holds.

Nonetheless, one _can_ optimize the {{ByteArrayOutputStream}} usage by 
something like {{DirectAccessByteArrayOutputStream}} if one later replaces

{code}
output.write(rawObject.toByteArray());
{code}
by 
{code}
output.write(rawObject.toByteArray(), 0, rawObject.size());
{code}

(But overriding {{toByteArray}} here is really bad style. A non-misleading 
method name should have been used...)
-----
{quote}
The real mystery here is why the result file is still readable by PDFBox
{quote}
If you look at 
{{org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parseAllObjects()}}, you'll 
see:
{code}
                if (finalPosition > 0 && currentPosition < finalPosition)
                {
                    // jump to the offset of the object to be parsed
                    source.skip(finalPosition - (int) currentPosition);
                }
{code}
I.e. when parsing object streams, you take the claimed position in the stream 
({{finalPosition}}) into account _only if the current position is less than 
it_. Due to the extra data you wrote into the object stream, though, the 
current position usually is larger than the claimed position. Thus, you ignore 
the claimed position. But as you have parsed the previous object, you only need 
to ignore the extra 0x00 bytes after that previous object to get to the next 
object, and the parser does this without issue.

Other parsers, though, that always take the claimed position into account, will 
fail here.

Thus, no miracle. Nonetheless, ignoring the claimed position will cause issues 
with certain other object streams...


was (Author: mkl):
{quote}
The length of "buf" isn't same as the length of the data. To prove this, ...
{quote}

Well, not much need to prove this, it's fairly obvious that the internal 
{{ByteArrayOutputStream}} buffer usually is longer than the actual data it 
holds.

Nonetheless, one _can_ optimize the {{ByteArrayOutputStream}} usage by 
something like {{DirectAccessByteArrayOutputStream}} if one later replaces

{code}
output.write(rawObject.toByteArray());
{code}
by 
{code}
output.write(rawObject.toByteArray(), 0, rawObject.size());
{code}

(But overriding {{toByteArray}} here is really bad style. A non-misleading 
method name should have been used...)
-----
{quote}
The real mystery here is why the result file is still readable by PDFBox
{quote}
If you look at 
{{org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parseAllObjects()}}, you'll 
see:
{code}
                if (finalPosition > 0 && currentPosition < finalPosition)
                {
                    // jump to the offset of the object to be parsed
                    source.skip(finalPosition - (int) currentPosition);
                }
{code}
I.e. when parsing object streams, you take the claimed position in the stream 
({{finalPosition}}) into account _only if the current position is less than 
it_. Due to the extra data you wrote into the object stream, though, the 
current position usually is larger than the claimed position. Thus, you ignore 
the claimed position. But as you have parsed the previous object, you only need 
to skip over the extra 0x00 bytes after that previous object to get to the next 
object, and the parser does this without issue.

Other parsers, though, that always take the claimed position into account, will 
fail here.

Thus, no miracle. Nonetheless, ignoring the claimed position will cause issues 
with certain other object streams...

> New release will produce incorrect files.
> -----------------------------------------
>
>                 Key: PDFBOX-6142
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6142
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Daniel Persson
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>         Attachments: after.pdf, before.pdf
>
>
> After just running load and save of a file I get multiple syntax errors in 
> the result. No errors in the original.
>  
>  
> {code:java}
> PDDocument doc = Loader.loadPDF(new File("before.pdf"));
> doc.save(new File("after.pdf")); {code}
>  
>  
> When I have run this code I will then render the PDF with Poppler 
>  
> {code:java}
> pdftoppm version 25.08.0
> Copyright 2005-2025 The Poppler Developers - http://poppler.freedesktop.org
> Copyright 1996-2011, 2022 Glyph & Cog, LLC
> {code}
>  
>  
> Running this command
>  
> {code:java}
> pdftoppm -png after.pdf after {code}
>  
>  
> Will generate a lot off Syntax errors
> Syntax Error: Unterminated string
> Syntax Error: End of file inside dictionary
> Syntax Error (4124140): Illegal character ')'
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: Dictionary key must be a name object
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: Unterminated hex string
> Syntax Error (4127569): Illegal character <2f> in hex string
> Syntax Error (4127569): Illegal character <73> in hex string
> Syntax Error (4127569): Illegal character <6f> in hex string
> Syntax Error (4127569): Illegal character <6e> in hex string
> Syntax Error (4127569): Illegal character <74> in hex string
> Syntax Error (4127569): Illegal character <2f> in hex string
> Syntax Error (4127570): Illegal character <52> in hex string
> Syntax Error (4127571): Illegal character <4b> in hex string
> Syntax Error (4127572): Illegal character <47> in hex string
> Syntax Error (4127573): Illegal character <4e> in hex string
> Syntax Error (4127574): Illegal character <4f> in hex string
> Syntax Error (4127575): Illegal character <51> in hex string
> Syntax Error (4127577): Illegal character <2b> in hex string
> Syntax Error (4127577): Illegal character <47> in hex string
> Syntax Error (4127577): Illegal character <69> in hex string
> Syntax Error (4127577): Illegal character <6c> in hex string
> Syntax Error (4127577): Illegal character <6c> in hex string
> Syntax Error (4127577): Illegal character <53> in hex string
> Syntax Error (4127577): Illegal character <6e> in hex string
> Syntax Error (4127577): Illegal character <73> in hex string
> Syntax Error (4127580): Illegal character <2f> in hex string
> Syntax Error (4127580): Illegal character <6e> in hex string
> Syntax Error (4127580): Illegal character <6f> in hex string
> Syntax Error (4127580): Illegal character <69> in hex string
> Syntax Error (4127580): Illegal character <6e> in hex string
> Syntax Error (4127580): Illegal character <67> in hex string
> Syntax Error (4127580): Illegal character <2f> in hex string
> Syntax Error (4127580): Illegal character <57> in hex string
> Syntax Error (4127580): Illegal character <69> in hex string
> Syntax Error (4127580): Illegal character <6e> in hex string
> Syntax Error (4127580): Illegal character <6e> in hex string
> Syntax Error (4127580): Illegal character <73> in hex string
> Syntax Error (4127580): Illegal character <69> in hex string
> Syntax Error (4127580): Illegal character <6e> in hex string
> Syntax Error (4127580): Illegal character <6f> in hex string
> Syntax Error (4127580): Illegal character <69> in hex string
> Syntax Error (4127580): Illegal character <6e> in hex string
> Syntax Error: Unterminated hex string
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: Unterminated hex string
> Syntax Error (4128664): Illegal character <2f> in hex string
> Syntax Error (4128664): Illegal character <4d> in hex string
> Syntax Error (4128664): Illegal character <74> in hex string
> Syntax Error (4128664): Illegal character <74> in hex string
> Syntax Error (4128667): Illegal character <52> in hex string
> Syntax Error (4128667): Illegal character <2f> in hex string
> Syntax Error (4128667): Illegal character <53> in hex string
> Syntax Error (4128667): Illegal character <75> in hex string
> Syntax Error (4128667): Illegal character <74> in hex string
> Syntax Error (4128667): Illegal character <79> in hex string
> Syntax Error (4128667): Illegal character <70> in hex string
> Syntax Error (4128667): Illegal character <2f> in hex string
> Syntax Error (4128667): Illegal character <58> in hex string
> Syntax Error (4128667): Illegal character <4d> in hex string
> Syntax Error (4128667): Illegal character <4c> in hex string
> Syntax Error (4128667): Illegal character <2f> in hex string
> Syntax Error (4128667): Illegal character <54> in hex string
> Syntax Error (4128667): Illegal character <79> in hex string
> Syntax Error (4128667): Illegal character <70> in hex string
> Syntax Error (4128667): Illegal character <2f> in hex string
> Syntax Error (4128667): Illegal character <4d> in hex string
> Syntax Error (4128667): Illegal character <74> in hex string
> Syntax Error (4128667): Illegal character <74> in hex string
> Syntax Error (4128667): Illegal character '>'
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: End of file inside array
> Syntax Error: End of file inside dictionary
> Syntax Error: Unterminated string
> Syntax Error: End of file inside dictionary
> Syntax Error (4130539): Illegal character ')'
> Syntax Error: End of file inside dictionary
> Syntax Error: Unterminated string
> Syntax Error: End of file inside dictionary
> Syntax Error: Unterminated string
> Syntax Error: End of file inside dictionary
> Syntax Error: Kid object (page 1) is wrong type (integer)
>  
> Test is running on the last commit on the 3.0 branch.
> commit 41fbf05f7b470a92ca396f5253896a3d9d253e11 (HEAD -> 3.0, upstream/3.0)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to