Tim Allison created TIKA-4106:
---------------------------------

             Summary: Digesting and content length on embedded ole/zip/pdf 
files are not calculated
                 Key: TIKA-4106
                 URL: https://issues.apache.org/jira/browse/TIKA-4106
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison


We've currently put the digester on the parser.  The problem is that some of 
the detectors for some file formats open the full file and then put that object 
in the openContainer of the TikaInputStream, which means that the InputStream 
for those parsers that reuse the openContainer (created by the detector) is 
never read.

 

 

The outcome of this is that embedded OLE2, Zip (in some circumstances) and 
PDF(?) files are never digested nor are their stream lengths extracted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to