[ 
https://issues.apache.org/jira/browse/PDFBOX-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586169#comment-13586169
 ] 

Andreas Lehmkühler commented on PDFBOX-1522:
--------------------------------------------

That NPE has another cause:

25.02.2013 20:14:12 org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNUNG: java.lang.NumberFormatException: For input string: "1.52587894E-"
java.lang.NumberFormatException: For input string: "1.52587894E-"
        at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
        at java.lang.Double.valueOf(Double.java:447)
        at org.apache.fontbox.cff.CFFParser.readRealNumber(CFFParser.java:318)
        at org.apache.fontbox.cff.CFFParser.readEntry(CFFParser.java:191)
        at org.apache.fontbox.cff.CFFParser.readDictData(CFFParser.java:166)
        at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:528)
        at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:109)
        at 
org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:322)
        at 
org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104)
        at 
org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162)
        at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:92)
        at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:187)
        at 
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)
        at 
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
        at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
        at org.apache.pdfbox.pdfviewer.PDFPagePanel.paint(PDFPagePanel.java:98)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JViewport.paint(JViewport.java:728)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JLayeredPane.paint(JLayeredPane.java:559)
        at javax.swing.JComponent.paintChildren(JComponent.java:837)
        at javax.swing.JComponent.paint(JComponent.java:1009)
        at javax.swing.JComponent.paintWithOffscreenBuffer(JComponent.java:4966)
        at javax.swing.JComponent.paintDoubleBuffered(JComponent.java:4919)
        at javax.swing.JComponent._paintImmediately(JComponent.java:4862)
        at javax.swing.JComponent.paintImmediately(JComponent.java:4669)
        at javax.swing.RepaintManager.paintDirtyRegions(RepaintManager.java:451)
        at 
javax.swing.SystemEventQueueUtilities$ComponentWorkRequest.run(SystemEventQueueUtilities.java:110)
        at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:209)
        at java.awt.EventQueue.dispatchEvent(EventQueue.java:461)
        at 
java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:242)
        at 
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:163)
        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:157)
        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:149)
        at java.awt.EventDispatchThread.run(EventDispatchThread.java:110)


It is a malformed floating point value, the exponent is missing. I fixed that 
in revision 1449818 by adding a "0" as exponent. I'm not sure if that will work 
in every case, but it seems suitable in this one.

                
> Some PDF files are causing exception (java.io.IOException: Error: Could not 
> find font(COSName{F53.0}) in map=)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1522
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1522
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.7.1
>         Environment: RHEL 6
>            Reporter: Diwakar Timilsina
>            Priority: Minor
>
> I am using PDFBox 1.7.1 and when parsing some PDF files, it is throwing 
> exceptions and it's filling the Tomcat log very quickly (100MB in few 
> seconds). There was another bug filed related to this issue. I tried the 
> patch supplied in that bug but the issue is still there. I want to mention 
> that the text gets extracted successfully from the PDF. But it just throws a 
> log of WARN messages in the logs. As a workaround, I have set the LOG level 
> to ERROR to avoid those WARN messages.
> Here is the problematic PDF file:
> http://doratst.uark.edu/fedora/repository/default%3A1590/OBJ/Traveler20120822.pdf
> Related bug:
> https://issues.apache.org/jira/browse/PDFBOX-1359#comment-13584669
> I am getting the following exception:
> WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
> java.lang.NullPointerException
> WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
> java.lang.NullPointerException
> WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.io.IOException: Error: 
> Could not find font(COSName{F53.0}) in 
> map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923, 
> F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
> java.io.IOException: Error: Could not find font(COSName{F53.0}) in 
> map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923, 
> F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
>       at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
>       at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
>       at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:247)
>       at 
> dk.defxws.fedoragsearch.server.TransformerToText.getTextFromPDF(TransformerToText.java:335)
>       at 
> dk.defxws.fedoragsearch.server.TransformerToText.getText(TransformerToText.java:194)
>       at 
> dk.defxws.fedoragsearch.server.GenericOperationsImpl.getDatastreamText(GenericOperationsImpl.java:668)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:399)
>       at 
> org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:438)
>       at 
> org.apache.xalan.extensions.ExtensionsTable.extFunction(ExtensionsTable.java:220)
>       at 
> org.apache.xalan.transformer.TransformerImpl.extFunction(TransformerImpl.java:473)
>       at 
> org.apache.xpath.functions.FuncExtFunction.execute(FuncExtFunction.java:206)
>       at 
> org.apache.xpath.Expression.executeCharsToContentHandler(Expression.java:311)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to