Daniel Persson created PDFBOX-4501:
--------------------------------------

             Summary: References numbers in embedded PDF become floats
                 Key: PDFBOX-4501
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4501
             Project: PDFBox
          Issue Type: Bug
            Reporter: Daniel Persson
         Attachments: float_pointer.patch

Hi everyone.

We found an issue that happens sometimes with smaller producers that create PDF 
files with embedded advertisements or other articles. 

For some reason, this embedded makes the library to throw an exception and not 
read the file. In many cases, we can read most of the pages but just these 
embedded data will be missing.

I wrote a little patch that will handle the issue but I don't know how to 
decode the embedded data so I have not debugged the issue further. I will add a 
link to the file because it's 124 Mb so not allowed to upload with the issue.

[https://drive.google.com/file/d/1hQslqtrbIoo5bTmMXgH1NDSYXuvIUOAQ/view?usp=sharing]

If we could find a solution where the PDF could be read correctly that would be 
great but the current behavior of not reading it at all is not great.

 

```

java.io.IOException: expected number, actual=COSFloat\{18446744073221199360} at 
offset 127766191
 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:166)
 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
 org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
 org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912)
 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881)
 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801)
 org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:761)
 org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
 org.apache.pdfbox.debugger.PDFDebugger$12.open(PDFDebugger.java:1272)
 
org.apache.pdfbox.debugger.PDFDebugger$DocumentOpener.parse(PDFDebugger.java:1383)
 org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1275)
 org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1252)
 org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1243)

```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to