[
https://issues.apache.org/jira/browse/TIKA-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051983#comment-16051983
]
Thomas Mortagne commented on TIKA-2395:
---------------------------------------
bq. If you could look through your dependencies and make sure that old Tika
bits aren't lying around, that might help.
Indeed when you look at /webapps/xwiki/WEB-INF/lib you can see a
tika-langdetect-1.13 jar (tika-core and tika-parsers are fine). I looked at
where it's coming from and this is a news dependency (that's why I did not
noticed) from tika-parsers (actually coming from sentiments-analysis-parser).
tika-parser pom need to make sure to setup a <dependencyManagement> for
tika-langdetect to make sure it's 1.15 or many people will have the same issue.
I created TIKA-2397.
> The parser does not support InputStream without built in mark/reset support
> anymore
> -----------------------------------------------------------------------------------
>
> Key: TIKA-2395
> URL: https://issues.apache.org/jira/browse/TIKA-2395
> Project: Tika
> Issue Type: Bug
> Components: detector, parser
> Affects Versions: 1.15
> Reporter: Thomas Mortagne
> Priority: Blocker
>
> After upgrade to 1.5 (from 1.4) it seems that the detector does not properly
> support all kinds of InputStream like it used to.
> I get tons of:
> {noformat}
> org.apache.tika.io.TaggedIOException: mark/reset not supported
> at
> org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
> at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170)
> at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673)
> at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
> at org.apache.tika.Tika.parseToString(Tika.java:527)
> at
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509)
> at
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111)
> at
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93)
> at
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133)
> at
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504)
> at
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411)
> at
> org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: mark/reset not supported
> at java.io.InputStream.reset(InputStream.java:348)
> at
> org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169)
> at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168)
> ... 13 common frames omitted
> {noformat}
> This regression makes tika unusable for us.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)