Tim Allison created TIKA-2852:
---------------------------------

             Summary: Add reports for missing/unaligned files in tika-eval 
Compare mode
                 Key: TIKA-2852
                 URL: https://issues.apache.org/jira/browse/TIKA-2852
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


We currently include reports for differences in attachment counts.  It would 
also be useful to report on the number of "unaligned" files by mime type.  

Consider the two extracts from the same file by different versions of Tika with 
attachments.
{noformat}
ExtractA                          ExtractB
msword (container)        msword
   /emf                                 /zip
   /emf
   /zip
{noformat}

We know from the current reports that msword files are missing attachments in 
extractB.  It would be useful to know that 2 emfs went missing in ExtractB, or 
rather, to sum the mimes for missing attachments in the B run and the A run.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to