Thomas Mortagne created TIKA-2395:
-------------------------------------

             Summary: The parser does not support InputStream without built in 
mark/reset support anymore
                 Key: TIKA-2395
                 URL: https://issues.apache.org/jira/browse/TIKA-2395
             Project: Tika
          Issue Type: Bug
          Components: detector, parser
    Affects Versions: 1.15
            Reporter: Thomas Mortagne
            Priority: Blocker


After upgrade to 1.5 (from 1.4) it seems that the detector does not properly 
support all kinds of InputStream like it used to.

I get tons of:

{noformat}
org.apache.tika.io.TaggedIOException: mark/reset not supported
        at 
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
        at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170)
        at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673)
        at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474)
        at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
        at org.apache.tika.Tika.parseToString(Tika.java:527)
        at 
org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509)
        at 
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111)
        at 
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93)
        at 
org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133)
        at 
org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504)
        at 
org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411)
        at 
org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: mark/reset not supported
        at java.io.InputStream.reset(InputStream.java:348)
        at 
org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169)
        at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168)
        ... 13 common frames omitted
{noformat}

This regression makes tika unusable for us.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to