[ 
https://issues.apache.org/jira/browse/TIKA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025847#comment-17025847
 ] 

Jan Vlug commented on TIKA-3031:
--------------------------------

[~nick]: You are right.

I created: https://issues.apache.org/jira/browse/PDFBOX-4753

Should we leave this issue open until it is solved in PDFbox/FontBox, or close 
it?

> NumberFormatException while parsing a certain PDF document
> ----------------------------------------------------------
>
>                 Key: TIKA-3031
>                 URL: https://issues.apache.org/jira/browse/TIKA-3031
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.23
>            Reporter: Jan Vlug
>            Priority: Major
>         Attachments: aab-angola-okt-2003.pdf
>
>
> I have a document for which TIKA produces the following stacktrace:
> Apache Tika was unable to parse the document
> at 
> /home/jan/Projects/KOOP/tika/problematische_documenten/aab-angola-okt-2003.pdf.
> The full exception stack trace is included below:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.pdf.PDFParser@407a83f0
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
>  at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
>  at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358)
>  at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309)
>  at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267)
>  at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>  at 
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>  at 
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>  at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>  at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>  at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:842)
>  at 
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:886)
>  at java.awt.Component.processMouseEvent(Component.java:6539)
>  at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>  at java.awt.Component.processEvent(Component.java:6304)
>  at java.awt.Container.processEvent(Container.java:2239)
>  at java.awt.Component.dispatchEventImpl(Component.java:4889)
>  at java.awt.Container.dispatchEventImpl(Container.java:2297)
>  at java.awt.Component.dispatchEvent(Component.java:4711)
>  at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4904)
>  at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4535)
>  at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4476)
>  at java.awt.Container.dispatchEventImpl(Container.java:2283)
>  at java.awt.Window.dispatchEventImpl(Window.java:2746)
>  at java.awt.Component.dispatchEvent(Component.java:4711)
>  at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:760)
>  at java.awt.EventQueue.access$500(EventQueue.java:97)
>  at java.awt.EventQueue$3.run(EventQueue.java:709)
>  at java.awt.EventQueue$3.run(EventQueue.java:703)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at 
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
>  at 
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:84)
>  at java.awt.EventQueue$4.run(EventQueue.java:733)
>  at java.awt.EventQueue$4.run(EventQueue.java:731)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at 
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
>  at java.awt.EventQueue.dispatchEvent(EventQueue.java:730)
>  at 
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205)
>  at 
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>  at 
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>  at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>  at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>  at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
> Caused by: java.lang.NumberFormatException: For input string: "E-00048828125"
>  at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
>  at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
>  at java.lang.Double.parseDouble(Double.java:538)
>  at java.lang.Double.valueOf(Double.java:502)
>  at org.apache.fontbox.cff.CFFParser.readRealNumber(CFFParser.java:415)
>  at org.apache.fontbox.cff.CFFParser.readEntry(CFFParser.java:278)
>  at org.apache.fontbox.cff.CFFParser.readDictData(CFFParser.java:244)
>  at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:422)
>  at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:122)
>  at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:75)
>  at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:102)
>  at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:74)
>  at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
>  at 
> org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:61)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:506)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:480)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:153)
>  at 
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
>  at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
>  at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:153)
>  at 
> org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:867)
>  at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
>  at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:124)
>  at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:162)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  ... 45 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to