[ https://issues.apache.org/jira/browse/TIKA-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050829#comment-16050829 ]
Thomas Mortagne commented on TIKA-2395: --------------------------------------- There is not really any unit test with this error, it only happen in a running instance of XWiki (plus I reverted to 1.4 for now) I will try to setup something simple to reproduce the issue when I can find some time (but that might be in quite some time...). In the meantime if someone is motivated it's possible to reproduce and debug the error the following way: * download https://mortagne.org/files/xwiki-platform-distribution-flavor-jetty-hsqldb-9.5-20170615.113126-17.zip (require Java 8) * execute start_xwiki_debug.sh, this will start Java in debug mode and you can attach to it on port 5005 * go to http://127.0.0.1:8080/xwiki, it will start scanning wiki pages, parse the attachments and produce the errors I pasted in the description Someone could attach to the running instance before it start scanning attachments, put a breakpoint in Tika#parseToString and debug from there. > The parser does not support InputStream without built in mark/reset support > anymore > ----------------------------------------------------------------------------------- > > Key: TIKA-2395 > URL: https://issues.apache.org/jira/browse/TIKA-2395 > Project: Tika > Issue Type: Bug > Components: detector, parser > Affects Versions: 1.15 > Reporter: Thomas Mortagne > Priority: Blocker > > After upgrade to 1.5 (from 1.4) it seems that the detector does not properly > support all kinds of InputStream like it used to. > I get tons of: > {noformat} > org.apache.tika.io.TaggedIOException: mark/reset not supported > at > org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) > at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170) > at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673) > at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115) > at org.apache.tika.Tika.parseToString(Tika.java:527) > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509) > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111) > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93) > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133) > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504) > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411) > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: mark/reset not supported > at java.io.InputStream.reset(InputStream.java:348) > at > org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169) > at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168) > ... 13 common frames omitted > {noformat} > This regression makes tika unusable for us. -- This message was sent by Atlassian JIRA (v6.4.14#64029)