Tim Allison created TIKA-4106:
---------------------------------
Summary: Digesting and content length on embedded ole/zip/pdf
files are not calculated
Key: TIKA-4106
URL: https://issues.apache.org/jira/browse/TIKA-4106
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
We've currently put the digester on the parser. The problem is that some of
the detectors for some file formats open the full file and then put that object
in the openContainer of the TikaInputStream, which means that the InputStream
for those parsers that reuse the openContainer (created by the detector) is
never read.
The outcome of this is that embedded OLE2, Zip (in some circumstances) and
PDF(?) files are never digested nor are their stream lengths extracted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)