Slava G created TIKA-2727:
-----------------------------
Summary: Parsing and detect mime type of XML file stuck in
infinite loop
Key: TIKA-2727
URL: https://issues.apache.org/jira/browse/TIKA-2727
Project: Tika
Issue Type: Bug
Components: detector, parser
Affects Versions: 1.17
Reporter: Slava G
Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
Hi,
I'm trying to parse (even mime type detect) some XML file that it's not large,
but kinda tricky and my process hangs on :
XMLStringBuffer.append(char[], int, int) line: not available
XMLStringBuffer.append(XMLString) line: not available
XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString,
String, boolean, String) line: not available
XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available
XMLNSDocumentScannerImpl.scanStartElement() line: not available
XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not
available
XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
line: not available
XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
line: not available
XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not
available
XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource)
line: not available
SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not
available
SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not
available
SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available
SAXParserImpl.parse(InputSource, DefaultHandler) line: not available
SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195
XmlRootExtractor.extractRootElement(InputStream) line: 62
XmlRootExtractor.extractRootElement(byte[]) line: 42
MimeTypes.getMimeType(byte[]) line: 212
MimeTypes.detect(InputStream, Metadata) line: 494
DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
Please see attached XML file.
Please advise.
Thanks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)