Jukka Zitting schrieb:
> Hi,
>
> On Tue, Aug 24, 2010 at 2:07 PM, reinhard schwab <[email protected]>
> wrote:
>
>> another exception i have encountered today
>>
>> parse
>> ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/content-knowledge/flyer-tim_en.pdf
>>
>> java.lang.RuntimeException: java.io.IOException: Not a number: -
>>
>
> Looks like PDFBOX-592 that I just fixed in PDFBox trunk based on the
> suggestion in the issue.
>
> BR,
>
> Jukka Zitting
>
>
now its not a hyphen, its a point.
same document.
i have synchronized to trunk of pdfbox and rebuild tika.
java.lang.RuntimeException: java.io.IOException: Not a number: .
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:149)
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:158)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:241)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:441)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:365)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:321)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:241)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:87)
Caused by: java.io.IOException: Not a number: .
at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:84)
at
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:324)
at
org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:47)
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:146)
... 15 more
best regards
reinhard