[
https://issues.apache.org/jira/browse/PDFBOX-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142270#comment-17142270
]
Carl Grundstrom commented on PDFBOX-4894:
-----------------------------------------
I tested pdfbox-app-2.0.21-20200621.095502-54.jar on a 2.1G file that had
previously failed and it worked just fine. I tested the ExtractText and
PDFReader commands. Thank for putting in the fix.
> Invalid file offsets for PDF files larger than 2G
> -------------------------------------------------
>
> Key: PDFBOX-4894
> URL: https://issues.apache.org/jira/browse/PDFBOX-4894
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.20
> Environment: Linux
> Reporter: Carl Grundstrom
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 2.0.21, 3.0.0 PDFBox
>
>
> An integer is being used to calculate file offsets for COS objects. This
> works fine for small PDF files, but breaks when the PDF file is larger than
> 2G. For many large files (136 out of 216 in my sample set), negative file
> offsets are generated for some of the COS objects due to integer overflow.
> This results in an IOException being thrown in COSParser.java at line 728.
> Note that these negative offsets are not valid object stream references.
> I have fixed the problem in my local copy of the code by modifying
> PDFXrefStreamParser.java starting at line 158.
> Current code:
> {code}
> int offset = 0;
> for(int i = 0; i < w1; i++)
> { offset += (currLine[i + w0] & 0x00ff) << ((w1 - i - 1) * 8); }
> {code}
> New code:
> {code}
> long offset = 0;
> for(int i = 0; i < w1; i++)
> { offset += ((long)(currLine[i + w0] & 0x00ff)) << ((w1 - i - 1) * 8); }
> {code}
> I can submit a sample PDF file if desired (it will be more than 2G in size)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]