Hi Jukka, See https://issues.apache.org/jira/browse/TIKA-881
The fix that Klaus provided avoids using reset() on the input stream. But I thought that Tika tries to wrap streams such that a reset() will work properly, as otherwise auto detection of content can fail. I haven't had to dig into all of the tricky issues around stream management, so I'm hoping you can take a look at Klaus's report and provide commentary. Thanks! -- Ken -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr