[
https://issues.apache.org/jira/browse/PDFBOX-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diwakar Timilsina updated PDFBOX-1522:
--------------------------------------
Description:
I am using PDFBox 1.7.1 and when parsing some PDF files, it is throwing
exceptions and it's filling the Tomcat log very quickly (100MB in few seconds).
There was another bug filed related to this issue. I tried the patch supplied
in that bug but the issue is still there. I want to mention that the text gets
extracted successfully from the PDF. But it just throws a log of WARN messages
in the logs. As a workaround, I have set the LOG level to ERROR to avoid those
WARN messages.
Here is the problematic PDF file:
http://doratst.uark.edu/fedora/repository/default%3A1590/OBJ/Traveler20120822.pdf
Related bug:
https://issues.apache.org/jira/browse/PDFBOX-1359#comment-13584669
I am getting the following exception:
WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
java.lang.NullPointerException
WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
java.lang.NullPointerException
WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.io.IOException: Error:
Could not find font(COSName{F53.0}) in
map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923,
F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
java.io.IOException: Error: Could not find font(COSName{F53.0}) in
map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923,
F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
at
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:247)
at
dk.defxws.fedoragsearch.server.TransformerToText.getTextFromPDF(TransformerToText.java:335)
at
dk.defxws.fedoragsearch.server.TransformerToText.getText(TransformerToText.java:194)
at
dk.defxws.fedoragsearch.server.GenericOperationsImpl.getDatastreamText(GenericOperationsImpl.java:668)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:399)
at
org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:438)
at
org.apache.xalan.extensions.ExtensionsTable.extFunction(ExtensionsTable.java:220)
at
org.apache.xalan.transformer.TransformerImpl.extFunction(TransformerImpl.java:473)
at
org.apache.xpath.functions.FuncExtFunction.execute(FuncExtFunction.java:206)
at
org.apache.xpath.Expression.executeCharsToContentHandler(Expression.java:311)
was:
I am using PDFBox 1.7.1 and when parsing some PDF files, it is throwing
exceptions and it's filling the Tomcat log very quickly (1GB+ in few seconds).
There was another bug filed related to this issue. I tried the patch supplied
in that bug but the issue is still there.
Here is the problematic PDF file:
http://doratst.uark.edu/fedora/repository/default%3A1590/OBJ/Traveler20120822.pdf
Related bug:
https://issues.apache.org/jira/browse/PDFBOX-1359#comment-13584669
I am getting the following exception:
WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
java.lang.NullPointerException
WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
java.lang.NullPointerException
WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.io.IOException: Error:
Could not find font(COSName{F53.0}) in
map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923,
F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
java.io.IOException: Error: Could not find font(COSName{F53.0}) in
map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923,
F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
at
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:247)
at
dk.defxws.fedoragsearch.server.TransformerToText.getTextFromPDF(TransformerToText.java:335)
at
dk.defxws.fedoragsearch.server.TransformerToText.getText(TransformerToText.java:194)
at
dk.defxws.fedoragsearch.server.GenericOperationsImpl.getDatastreamText(GenericOperationsImpl.java:668)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:399)
at
org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:438)
at
org.apache.xalan.extensions.ExtensionsTable.extFunction(ExtensionsTable.java:220)
at
org.apache.xalan.transformer.TransformerImpl.extFunction(TransformerImpl.java:473)
at
org.apache.xpath.functions.FuncExtFunction.execute(FuncExtFunction.java:206)
at
org.apache.xpath.Expression.executeCharsToContentHandler(Expression.java:311)
> Some PDF files are causing exception (java.io.IOException: Error: Could not
> find font(COSName{F53.0}) in map=)
> --------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-1522
> URL: https://issues.apache.org/jira/browse/PDFBOX-1522
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.7.1
> Environment: RHEL 6
> Reporter: Diwakar Timilsina
>
> I am using PDFBox 1.7.1 and when parsing some PDF files, it is throwing
> exceptions and it's filling the Tomcat log very quickly (100MB in few
> seconds). There was another bug filed related to this issue. I tried the
> patch supplied in that bug but the issue is still there. I want to mention
> that the text gets extracted successfully from the PDF. But it just throws a
> log of WARN messages in the logs. As a workaround, I have set the LOG level
> to ERROR to avoid those WARN messages.
> Here is the problematic PDF file:
> http://doratst.uark.edu/fedora/repository/default%3A1590/OBJ/Traveler20120822.pdf
> Related bug:
> https://issues.apache.org/jira/browse/PDFBOX-1359#comment-13584669
> I am getting the following exception:
> WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
> java.lang.NullPointerException
> WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
> java.lang.NullPointerException
> WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.io.IOException: Error:
> Could not find font(COSName{F53.0}) in
> map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923,
> F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
> java.io.IOException: Error: Could not find font(COSName{F53.0}) in
> map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923,
> F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
> at
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
> at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
> at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
> at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:247)
> at
> dk.defxws.fedoragsearch.server.TransformerToText.getTextFromPDF(TransformerToText.java:335)
> at
> dk.defxws.fedoragsearch.server.TransformerToText.getText(TransformerToText.java:194)
> at
> dk.defxws.fedoragsearch.server.GenericOperationsImpl.getDatastreamText(GenericOperationsImpl.java:668)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:399)
> at
> org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:438)
> at
> org.apache.xalan.extensions.ExtensionsTable.extFunction(ExtensionsTable.java:220)
> at
> org.apache.xalan.transformer.TransformerImpl.extFunction(TransformerImpl.java:473)
> at
> org.apache.xpath.functions.FuncExtFunction.execute(FuncExtFunction.java:206)
> at
> org.apache.xpath.Expression.executeCharsToContentHandler(Expression.java:311)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira