Tim Allison created TIKA-3476:
---------------------------------

             Summary: Remove tag reports from default tika-eval reports
                 Key: TIKA-3476
                 URL: https://issues.apache.org/jira/browse/TIKA-3476
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


tika-eval can run on xhtml output from Tika.  When it does, it maintains counts 
of those tags, and then allows for sums of those tags per file type and 
comparison of tags extracted.

When tika-eval is run against text output from Tika, these queries are taking 
30 seconds per tag type on a million files because of the joins.

In Tika 2.x let's turn off tag reports by default, but allow users to include 
them if needed with the exising {{-rf}} (reports file) commandline option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to