Thomas Mortagne created TIKA-2395:
-------------------------------------
Summary: The parser does not support InputStream without built in
mark/reset support anymore
Key: TIKA-2395
URL: https://issues.apache.org/jira/browse/TIKA-2395
Project: Tika
Issue Type: Bug
Components: detector, parser
Affects Versions: 1.15
Reporter: Thomas Mortagne
Priority: Blocker
After upgrade to 1.5 (from 1.4) it seems that the detector does not properly
support all kinds of InputStream like it used to.
I get tons of:
{noformat}
org.apache.tika.io.TaggedIOException: mark/reset not supported
at
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170)
at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673)
at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
at org.apache.tika.Tika.parseToString(Tika.java:527)
at
org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509)
at
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111)
at
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93)
at
org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133)
at
org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504)
at
org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411)
at
org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: mark/reset not supported
at java.io.InputStream.reset(InputStream.java:348)
at
org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169)
at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168)
... 13 common frames omitted
{noformat}
This regression makes tika unusable for us.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)