[
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948970#comment-13948970
]
Tilman Hausherr commented on PDFBOX-1918:
-----------------------------------------
There will always be PDFs that are not correct. Your java application should
simply report that, and tell that it won't be able to analyse it, maybe point
to some FAQ, with the answer to "why didn't we index that PDF that was
generated by this multi-billion-dollar corporation and is displayed by Adobe
Viewer?"
The customer will then have to find a way to get a correct PDF. In this case,
either by learning about the "binary" option in his ftp software (if that was
the cause), or (if the PDF was really generated this way by Oracle) by
explaining an Oracle help desk intern assistant in Kasachstan the difference
between unix newlines and windows newlines and then pray that this information
will get up seven hierarchy levels and reach a developer who will put it in the
"todo" list and so that they will include it in the next major release.
That I was able to fix this PDF is just luck. Most PDFs aren't ascii readable
like that one.
> PDF convert error
> -----------------
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, Utilities
> Affects Versions: 1.8.4
> Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)
--
This message was sent by Atlassian JIRA
(v6.2#6252)