Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaEval" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaEval?action=diff&rev1=7&rev2=8 == Reading Extracts == - === alterMetadata === + === alterExtract === Let's say you want to compare the output of Tika to another tool that extracts text. You happen to have a directory of .json files for Tika and a directory of UTF-8 .txt files from the other tool. 1. If the other tool extracts embedded content, you'd want to concatenate all the content within Tika's .json file for a fair comparison: - `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB tika_1_15 -db comparisondb -alterMetadata concatenate_content` + `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB tika_1_15 -db comparisondb -alterExtract concatenate_content` 2.#2 If the other tool does not extract embedded content, you'd only want to look at the first metadata object (representing the container file) in the .json file: - `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB tika_1_15 -db comparisondb -alterMetadata first_only` + `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB tika_1_15 -db comparisondb -alterExtract first_only` == Reports == The module tika-eval comes with a list of reports. However, you might want to generate your own. Each report is specified by sql and a few other configurations in an xml file. See `comparison-reports.xml` and `profile-reports.xml` to get a sense of the syntax.
