Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaEval" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaEval?action=diff&rev1=7&rev2=8

  
  == Reading Extracts ==
  
- === alterMetadata ===
+ === alterExtract ===
  Let's say you want to compare the output of Tika to another tool that 
extracts text.  You happen to have a directory of .json files for Tika and a 
directory of UTF-8 .txt files from the other tool.
  
   1. If the other tool extracts embedded content, you'd want to concatenate 
all the content within Tika's .json file for a fair comparison:
-     `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB 
tika_1_15 -db comparisondb -alterMetadata concatenate_content`
+     `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB 
tika_1_15 -db comparisondb -alterExtract concatenate_content`
   
   2.#2 If the other tool does not extract embedded content, you'd only want to 
look at the first metadata object (representing the container file) in the 
.json file:
-     `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB 
tika_1_15 -db comparisondb -alterMetadata first_only`
+     `java -jar tika-eval.X.Y.jar Compare -extractDirA tika_1_14 -extractDirB 
tika_1_15 -db comparisondb -alterExtract first_only`
  
  == Reports ==
  The module tika-eval comes with a list of reports.  However, you might want 
to generate your own.  Each report is specified by sql and a few other 
configurations in an xml file.  See `comparison-reports.xml` and 
`profile-reports.xml` to get a sense of the syntax.

Reply via email to