Tim Allison created TIKA-3476:
---------------------------------
Summary: Remove tag reports from default tika-eval reports
Key: TIKA-3476
URL: https://issues.apache.org/jira/browse/TIKA-3476
Project: Tika
Issue Type: Task
Reporter: Tim Allison
tika-eval can run on xhtml output from Tika. When it does, it maintains counts
of those tags, and then allows for sums of those tags per file type and
comparison of tags extracted.
When tika-eval is run against text output from Tika, these queries are taking
30 seconds per tag type on a million files because of the joins.
In Tika 2.x let's turn off tag reports by default, but allow users to include
them if needed with the exising {{-rf}} (reports file) commandline option.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)