[
https://issues.apache.org/jira/browse/TIKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285068#comment-13285068
]
Rob Tulloh commented on TIKA-934:
---------------------------------
Additional evidence of re-entrancy issues:
2012-05-22_19:10:39.31249 Caused by: java.util.ConcurrentModificationException
2012-05-22_19:10:39.31253 at
java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
2012-05-22_19:10:39.31257 at
java.util.HashMap$KeyIterator.next(HashMap.java:828)
2012-05-22_19:10:39.31262 at
java.util.AbstractCollection.toArray(AbstractCollection.java:171)
2012-05-22_19:10:39.31266 at
org.apache.tika.metadata.Metadata.names(Metadata.java:171)
2012-05-22_19:10:39.31270 at
org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:156)
2012-05-22_19:10:39.31275 at
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
2012-05-22_19:10:39.31280 at
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:281)
2012-05-22_19:10:39.31285 at
org.apache.tika.parser.pdf.PDF2XHTML.startPage(PDF2XHTML.java:128)
2012-05-22_19:10:39.31289 at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:420)
2012-05-22_19:10:39.31293 at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
2012-05-22_19:10:39.31296 at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
2012-05-22_19:10:39.31300 at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:63)
2012-05-22_19:10:39.31304 at
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:140)
2012-05-22_19:10:39.31308 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
2012-05-22_19:10:39.31312 ... 4 more
> Tika in server mode stops responding and reports NPE over and over in logs
> --------------------------------------------------------------------------
>
> Key: TIKA-934
> URL: https://issues.apache.org/jira/browse/TIKA-934
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.1
> Environment: CentOS 5.x
> Reporter: Rob Tulloh
> Priority: Critical
>
> We run tika in server mode via:
> /usr/java/jdk/bin/java -Dlog4j.app.name=-server
> -Djavax.xml.soap.MessageFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl
> -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -server -Xms256M
> -Xmx768M -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/var/log/oom/content-extractor-8983.dump.1 -server -Xms500M
> -Xmx500M -jar /opt/tika/tika-app-1.1.jar --text --encoding=UTF-8 --server 8983
> Our client talks to this over port 8983. We pass data via the socket and get
> the responses back. However, sometimes, tika will get into a bad state and
> stop responding.
> When this happens, we see this in the logs over and over.
> 2012-05-24_20:12:33.88573 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:33.88576 at
> org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:33.88580 at
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:33.88584 at
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:33.88589 at
> org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:33.88593 at
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:33.88597 at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:33.88602 at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:33.88606 at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:33.88611 ... 4 more
> 2012-05-24_20:12:49.28441 org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.microsoft.OfficeParse
> r@6906daba
> 2012-05-24_20:12:49.28458 at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> 2012-05-24_20:12:49.28466 at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28477 at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 2012-05-24_20:12:49.28489 at
> org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
> 2012-05-24_20:12:49.28497 at
> org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:735)
> 2012-05-24_20:12:49.28509 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:49.28516 at
> org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:49.28524 at
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:49.28532 at
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:49.28541 at
> org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:49.28550 at
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:49.28558 at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:49.28565 at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:49.28577 at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28585 ... 4 more
> We have tried to figure out what causes this with no success. We only know
> that once the server gets into this state, there is no recourse but to
> restart the tika service.
> Other instances of tika we have running in the test environment continue to
> work. There is some combination of content or work that causes
> tika to destabilize. Our working theory is that perhaps tika server is not
> thread safe and that may be causing this behavior.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira