[ 
https://issues.apache.org/jira/browse/PDFBOX-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857693#comment-15857693
 ] 

Manuel Gübeli edited comment on PDFBOX-3677 at 2/8/17 9:08 AM:
---------------------------------------------------------------

Thank you for the fast response. Did some testing with the 
“pdfbox-app-2.0.5-20170207.183804-157.jar”. Think there are two fonts causing 
trouble.

Running with option “ExtractText”:
{quote}
SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
java.io.IOException: Found null but expected NAME
       at org.apache.fontbox.type1.Type1Parser.read(Type1Parser.java:763)
{quote}

{quote}
SEVERE: Can't read the embedded Type1 font AAAAAB+ArialMT
java.io.IOException: Found null but expected NAME
       at org.apache.fontbox.type1.Type1Parser.read(Type1Parser.java:763)
{quote}
I also attached the request Font files using “PDFDebugger”. See F1.txt and 
F2.txt attached. 



was (Author: guebeli):
Thank you for the fast response. Did some testing with the 
“pdfbox-app-2.0.5-20170207.183804-157.jar”. Think there are two fonts causing 
trouble.

Running with option “ExtractText”:
bq. SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
bq. java.io.IOException: Found null but expected NAME
bq.        at org.apache.fontbox.type1.Type1Parser.read(Type1Parser.java:763)

bq. SEVERE: Can't read the embedded Type1 font AAAAAB+ArialMT
bq. java.io.IOException: Found null but expected NAME
bq.        at org.apache.fontbox.type1.Type1Parser.read(Type1Parser.java:763)

I also attached the request Font files using “PDFDebugger”. See F1.txt and 
F2.txt attached. 


> NullPointerException in Type1Parser.read
> ----------------------------------------
>
>                 Key: PDFBOX-3677
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3677
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.3, 2.0.4
>         Environment: Windows 10, java version "1.8.0_25"
>            Reporter: Manuel Gübeli
>             Fix For: 2.0.5, 2.1.0
>
>         Attachments: StackTrace.txt
>
>
> Text extraction from certain PDFs is not possible and PDF Box responses with 
> NullPointerException. Text extraction from same PDF with version 1.8.13 is 
> working. 
> Originally the issue was discovered while using the newest Apache Tika 1.14 
> library. I can not down-grade to PDF Box 1.8.13 with Apache Tika 1.14.
> Unfortunately I can not provide the PDFs that fail to you. However, I did 
> some testing and found out that “Token token = lexer.nextToken();” return 
> Null.
> Feb 07, 2017 12:17:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
> SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
> java.io.IOException: Found token=null but expected NAME
> Caused by: java.io.EOFException
>       at 
> org.apache.pdfbox.io.ScratchFileBuffer.seek(ScratchFileBuffer.java:302)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.checkXRefOffset(COSParser.java:1177)
>       at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:202)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to