[ 
https://issues.apache.org/jira/browse/TIKA-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051964#comment-16051964
 ] 

Tim Allison commented on TIKA-2395:
-----------------------------------

I was able to reproduce this with your zip and directions.  Thank you!

The decompiled TikaInputStream that is being used looks old to me, and I'm also 
getting a "Decompiled code does not match source" warning from Intellij.  

This by itself does not explain the exception, but you might want to check to 
make sure there isn't some old tika bits hanging around.  

{noformat}
    public static TikaInputStream get(InputStream stream, TemporaryResources 
tmp) {
        if(stream == null) {
            throw new NullPointerException("The Stream must not be null");
        } else if(stream instanceof TikaInputStream) {
            return (TikaInputStream)stream;
        } else {
            if(!(stream instanceof BufferedInputStream) && !(stream instanceof 
ByteArrayInputStream)) {
                stream = new BufferedInputStream((InputStream)stream);
            }

            return new TikaInputStream((InputStream)stream, tmp, -1L);
        }
    }
{noformat}

The InputStream is an AutoCloseInputStream, which, again, doesn't explain the 
problem.

I'm not able to reproduce the problem in Tika trunk by wrapping either a 
ByteInputStream or a FileInputStream in an AutoCloseInputStream.  I also tried 
modifying TikaInputStream back to the old {{ if ...instanceof}} code, and I 
couldn't reproduce the problem in straight Tika.

If you could look through your dependencies and make sure that old Tika bits 
aren't lying around, that might help.

> The parser does not support InputStream without built in mark/reset support 
> anymore
> -----------------------------------------------------------------------------------
>
>                 Key: TIKA-2395
>                 URL: https://issues.apache.org/jira/browse/TIKA-2395
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, parser
>    Affects Versions: 1.15
>            Reporter: Thomas Mortagne
>            Priority: Blocker
>
> After upgrade to 1.5 (from 1.4) it seems that the detector does not properly 
> support all kinds of InputStream like it used to.
> I get tons of:
> {noformat}
> org.apache.tika.io.TaggedIOException: mark/reset not supported
>       at 
> org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
>       at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170)
>       at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673)
>       at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474)
>       at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
>       at org.apache.tika.Tika.parseToString(Tika.java:527)
>       at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509)
>       at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111)
>       at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93)
>       at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133)
>       at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504)
>       at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411)
>       at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: mark/reset not supported
>       at java.io.InputStream.reset(InputStream.java:348)
>       at 
> org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169)
>       at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168)
>       ... 13 common frames omitted
> {noformat}
> This regression makes tika unusable for us.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to