[ 
https://issues.apache.org/jira/browse/TIKA-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745706#comment-17745706
 ] 

Tim Allison commented on TIKA-4103:
-----------------------------------

The problem is that there are 14 embedded files (in 477727.ppt) with the same 
zero-byte MD5 (d41d8cd98f00b204e9800998ecf8427e)...

> Improve alignment algorithm for attachments in tika-eval Compare
> ----------------------------------------------------------------
>
>                 Key: TIKA-4103
>                 URL: https://issues.apache.org/jira/browse/TIKA-4103
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: TIKA-4103.tgz
>
>
> In the recent regression tests, [~tilman] noticed misalignments in 
> attachments in (at least) two files:
> govdocs1/912/912801.ppt
> govdocs1/477/477727.ppt
> We need to fix the alignment algorithm.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to