[ 
https://issues.apache.org/jira/browse/TIKA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758702#comment-17758702
 ] 

ASF GitHub Bot commented on TIKA-4106:
--------------------------------------

tballison merged PR #1303:
URL: https://github.com/apache/tika/pull/1303




> Digesting and content length on embedded ole/zip/pdf files are not calculated
> -----------------------------------------------------------------------------
>
>                 Key: TIKA-4106
>                 URL: https://issues.apache.org/jira/browse/TIKA-4106
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>
> We've currently put the digester on the parser.  The problem is that some of 
> the detectors for some file formats open the full file and then put that 
> object in the openContainer of the TikaInputStream, which means that the 
> InputStream for those parsers that reuse the openContainer (created by the 
> detector) is never read.
>  
>  
> The outcome of this is that embedded OLE2, Zip (in some circumstances) and 
> PDF(?) files are never digested nor are their stream lengths extracted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to