[
https://issues.apache.org/jira/browse/TIKA-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745706#comment-17745706
]
Tim Allison commented on TIKA-4103:
-----------------------------------
The problem is that there are 14 embedded files (in 477727.ppt) with the same
zero-byte MD5 (d41d8cd98f00b204e9800998ecf8427e)...
> Improve alignment algorithm for attachments in tika-eval Compare
> ----------------------------------------------------------------
>
> Key: TIKA-4103
> URL: https://issues.apache.org/jira/browse/TIKA-4103
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Priority: Minor
> Attachments: TIKA-4103.tgz
>
>
> In the recent regression tests, [~tilman] noticed misalignments in
> attachments in (at least) two files:
> govdocs1/912/912801.ppt
> govdocs1/477/477727.ppt
> We need to fix the alignment algorithm.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)