Jukka Zitting schrieb: > Hi, > > On Tue, Aug 24, 2010 at 2:07 PM, reinhard schwab <reinhard.sch...@aon.at> > wrote: > >> another exception i have encountered today >> >> parse >> ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/content-knowledge/flyer-tim_en.pdf >> >> java.lang.RuntimeException: java.io.IOException: Not a number: - >> > > Looks like PDFBOX-592 that I just fixed in PDFBox trunk based on the > suggestion in the issue. > > BR, > > Jukka Zitting > > now its not a hyphen, its a point. same document. i have synchronized to trunk of pdfbox and rebuild tika.
java.lang.RuntimeException: java.io.IOException: Not a number: . at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:149) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:158) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:241) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:441) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:365) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:321) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:241) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:87) Caused by: java.io.IOException: Not a number: . at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:84) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:324) at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:47) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:146) ... 15 more best regards reinhard