[jira] [Commented] (PDFBOX-4367) Error expected floating point number actual='18-5'

ASF subversion and git services (JIRA) Wed, 07 Nov 2018 10:31:08 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678608#comment-16678608
 ]


ASF subversion and git services commented on PDFBOX-4367:
---------------------------------------------------------

Commit 1846064 from til...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1846064 ]

PDFBOX-4367: run stripper by page as preparation to catch the exception in a 
later commit; improve usage text

> Error expected floating point number actual='18-5'
> --------------------------------------------------
>
>                 Key: PDFBOX-4367
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4367
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.12
>         Environment: Mac OS X Sierra
>            Reporter: Peter Johnson
>            Priority: Minor
>
> Able to repeat with command line.  Unfortunately, the only files that repeat 
> this are from a customer, and contain sensitive information.  The file opens 
> without error in Acrobat Reader and Mac Preview.  The desired result is that 
> any corrupt portions of the PDF are skipped, so that we can use what text is 
> extractable.
> Unfortunately, I still get an error when using the -force option.
> We get the following stack trace:
> {code:java}
> C02V390UHTD6:Downloads pjohnson$ java -jar pdfbox-app-2.0.12.jar ExtractText 
> 16cccd9af5032a303774f7b87fb95076.pdf
> Nov 02, 2018 10:04:54 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSArray
> WARNING: Corrupt object reference at offset 19727
> Exception in thread "main" java.io.IOException: Error expected floating point 
> number actual='18-5'
> at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:78)
> at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:110)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:947)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:631)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:174)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:510)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
> at 
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
> at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
> at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
> at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237)
> at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
> Caused by: java.lang.NumberFormatException
> at java.math.BigDecimal.<init>(BigDecimal.java:494)
> at java.math.BigDecimal.<init>(BigDecimal.java:383)
> at java.math.BigDecimal.<init>(BigDecimal.java:806)
> at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:59)
> ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4367) Error expected floating point number actual='18-5'

Reply via email to