[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2223.
----------------------------------
Resolution: Fixed
Committed to trunk in revision 1730808.
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection
> ----------------------------------------------------------------------------
>
> Key: NUTCH-2223
> URL: https://issues.apache.org/jira/browse/NUTCH-2223
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.11
> Reporter: Tien Nguyen Manh
> Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2223.patch
>
>
> Stracktrace for the hang seems to be:
> {code}
> at org.apache.xerces.impl.XMLScanner.scanExternalID(Unknown Source)
> at org.apache.xerces.impl.XMLDocumentScannerImpl.scanDoctypeDecl(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown
> Source)
> at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
> at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
> at
> org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:54)
> at
> org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:41)
> at org.apache.tika.mime.MimeTypes.getMimeType(MimeTypes.java:192)
> at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:439)
> at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:252)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:111)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)