[
https://issues.apache.org/jira/browse/TIKA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2852:
------------------------------
Description:
We currently include reports for differences in attachment counts. It would
also be useful to report on the number of "unaligned" files by mime type.
Consider the two extracts from the same file by different versions of Tika with
attachments.
{noformat}
ExtractA ExtractB
msword (container) msword
/emf /zip
/emf /txt
/zip
/txt
{noformat}
We know from the current reports that msword files are missing attachments in
extractB. It would be useful to know that 2 emfs went missing in ExtractB, or
rather, to sum the mimes for missing attachments in the B run and the A run.
was:
We currently include reports for differences in attachment counts. It would
also be useful to report on the number of "unaligned" files by mime type.
Consider the two extracts from the same file by different versions of Tika with
attachments.
{noformat}
ExtractA ExtractB
msword (container) msword
/emf /zip
/emf
/zip
{noformat}
We know from the current reports that msword files are missing attachments in
extractB. It would be useful to know that 2 emfs went missing in ExtractB, or
rather, to sum the mimes for missing attachments in the B run and the A run.
> Add reports for missing/unaligned files in tika-eval Compare mode
> -----------------------------------------------------------------
>
> Key: TIKA-2852
> URL: https://issues.apache.org/jira/browse/TIKA-2852
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> We currently include reports for differences in attachment counts. It would
> also be useful to report on the number of "unaligned" files by mime type.
> Consider the two extracts from the same file by different versions of Tika
> with attachments.
> {noformat}
> ExtractA ExtractB
> msword (container) msword
> /emf /zip
> /emf /txt
> /zip
> /txt
> {noformat}
> We know from the current reports that msword files are missing attachments in
> extractB. It would be useful to know that 2 emfs went missing in ExtractB,
> or rather, to sum the mimes for missing attachments in the B run and the A
> run.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)