[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625041#comment-16625041
 ] 

Slava G commented on TIKA-2727:
-------------------------------

Hi,

Testing the 1.19 and seems that on some files that it was stuck (1.17) it 
working fine, but still have one file that it's hang on it.

Is it possible to configure limit of expansion in TIKA only (as you said it's 
limited by 20) ? without specifying -JDjdk.xml.entityExpansionLimit=10 ?

Thanks.

> Parsing and detect mime type of XML file stuck in infinite loop
> ---------------------------------------------------------------
>
>                 Key: TIKA-2727
>                 URL: https://issues.apache.org/jira/browse/TIKA-2727
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, parser
>    Affects Versions: 1.17
>            Reporter: Slava G
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.19, 2.0.0
>
>         Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to