[
https://issues.apache.org/jira/browse/TIKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tyler Palsulich resolved TIKA-972.
----------------------------------
Resolution: Fixed
Marking as Fixed, since PDFBOX-1512 was fixed in PDFBox 1.8.8 (Tika's current
version).
> Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser .
> -----------------------------------------------------------------------
>
> Key: TIKA-972
> URL: https://issues.apache.org/jira/browse/TIKA-972
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.9
> Environment: Core java , Windows server 2003
> Reporter: Priya Kujur
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> While extracting text from PDF , Tika throws runtime exception. The
> exception is not thrown when java code is executed in windows 7 , but when it
> is executed on Windows server 2003; it is found.
> This is strange but my devlopment environment is windows 7 and production env
> is Server2003. Java being platform independent, this issue is making me crazy.
> Any kind of help is much appreciated.
> Please check the stack trace:
> java.io.IOException:
> at org.apache.tika.parser.ParsingReader.read(ParsingReader.java:271)
> at java.io.BufferedReader.fill(Unknown Source)
> at java.io.BufferedReader.readLine(Unknown Source)
> at java.io.BufferedReader.readLine(Unknown Source)
> at
> com.servient.utilities.textmanipulation.ReaderUtil.readBuffer(ReaderU
> til.java:39)
> at
> com.servient.mapi.metadata.factory.TikaMetaDataExport.processFile(Tik
> aMetaDataExport.java:255)
> at
> com.servient.mapi.metadata.factory.BaseMetadataExport.process(BaseMet
> adataExport.java:37)
> at
> com.servient.mapi.wrapper.AttachmentWrapper.saveTextMetadataExtract(A
> ttachmentWrapper.java:116)
> at
> com.servient.mapi.wrapper.AttachmentWrapper.process(AttachmentWrapper
> .java:40)
> at
> com.servient.mapi.wrapper.AttachmentWrapper.<init>(AttachmentWrapper.
> java:36)
> at
> com.servient.mapi.wrapper.MessageWrapper.writeCatalog(MessageWrapper.
> java:761)
> at
> com.servient.mapi.wrapper.MessageWrapper.writeCatalog(MessageWrapper.
> java:754)
> at
> com.servient.mapi.wrapper.MessageWrapper.process(MessageWrapper.java:
> 804)
> at com.servient.mapi.MAPI.main(MAPI.java:190)
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException
> from org.apache.tika.parser.pdf.PDFParser@ea0a39
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199
> )
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197
> )
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
> 35)
> at
> org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.ja
> va:232)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.IllegalArgumentException: Comparison method violates its
> ge
> neral contract!
> at java.util.TimSort.mergeHi(Unknown Source)
> at java.util.TimSort.mergeAt(Unknown Source)
> at java.util.TimSort.mergeCollapse(Unknown Source)
> at java.util.TimSort.sort(Unknown Source)
> at java.util.TimSort.sort(Unknown Source)
> at java.util.Arrays.sort(Unknown Source)
> at java.util.Collections.sort(Unknown Source)
> at
> org.apache.pdfbox.util.PDFTextStripper.writePage(PDFTextStripper.java
> :551)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.ja
> va:443)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.j
> ava:366)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java
> :322)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)