[ https://issues.apache.org/jira/browse/TIKA-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051964#comment-16051964 ]
Tim Allison commented on TIKA-2395: ----------------------------------- I was able to reproduce this with your zip and directions. Thank you! The decompiled TikaInputStream that is being used looks old to me, and I'm also getting a "Decompiled code does not match source" warning from Intellij. This by itself does not explain the exception, but you might want to check to make sure there isn't some old tika bits hanging around. {noformat} public static TikaInputStream get(InputStream stream, TemporaryResources tmp) { if(stream == null) { throw new NullPointerException("The Stream must not be null"); } else if(stream instanceof TikaInputStream) { return (TikaInputStream)stream; } else { if(!(stream instanceof BufferedInputStream) && !(stream instanceof ByteArrayInputStream)) { stream = new BufferedInputStream((InputStream)stream); } return new TikaInputStream((InputStream)stream, tmp, -1L); } } {noformat} The InputStream is an AutoCloseInputStream, which, again, doesn't explain the problem. I'm not able to reproduce the problem in Tika trunk by wrapping either a ByteInputStream or a FileInputStream in an AutoCloseInputStream. I also tried modifying TikaInputStream back to the old {{ if ...instanceof}} code, and I couldn't reproduce the problem in straight Tika. If you could look through your dependencies and make sure that old Tika bits aren't lying around, that might help. > The parser does not support InputStream without built in mark/reset support > anymore > ----------------------------------------------------------------------------------- > > Key: TIKA-2395 > URL: https://issues.apache.org/jira/browse/TIKA-2395 > Project: Tika > Issue Type: Bug > Components: detector, parser > Affects Versions: 1.15 > Reporter: Thomas Mortagne > Priority: Blocker > > After upgrade to 1.5 (from 1.4) it seems that the detector does not properly > support all kinds of InputStream like it used to. > I get tons of: > {noformat} > org.apache.tika.io.TaggedIOException: mark/reset not supported > at > org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) > at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170) > at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673) > at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115) > at org.apache.tika.Tika.parseToString(Tika.java:527) > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509) > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111) > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93) > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133) > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504) > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411) > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: mark/reset not supported > at java.io.InputStream.reset(InputStream.java:348) > at > org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169) > at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168) > ... 13 common frames omitted > {noformat} > This regression makes tika unusable for us. -- This message was sent by Atlassian JIRA (v6.4.14#64029)