Carl Grundstrom created PDFBOX-4894:
---------------------------------------
Summary: Invalid file offsets for PDF files larger than 2G
Key: PDFBOX-4894
URL: https://issues.apache.org/jira/browse/PDFBOX-4894
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 2.0.20
Environment: Linux
Reporter: Carl Grundstrom
Fix For: 2.0.21
An integer is being used to calculate file offsets for COS objects. This works
fine for small PDF files, but breaks when the PDF file is larger than 2G. For
many large files (136 out of 216 in my sample set), negative file offsets are
generated for some of the COS objects due to integer overflow. This results in
an IOException being thrown in COSParser.java at line 728. Note that these
negative offsets are not valid object stream references.
I have fixed the problem by modifying code in PDFXrefStreamParser.java starting
at line 158.
Current code:
int offset = 0;
for(int i = 0; i < w1; i++)
{
offset += (currLine[i + w0] & 0x00ff) << ((w1 - i - 1) * 8);
}
New code:
long offset = 0;
for(int i = 0; i < w1; i++)
{
offset += ((long)(currLine[i + w0] & 0x00ff)) << ((w1 - i - 1) * 8);
}
I can submit a sample PDF file if desired (it will be more than 2G in size)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]