Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaEval" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaEval

New page:
= Overview of the 'tika-eval' Module=
While not yet available, this page offers a first draft of the documentation 
for the tika-eval module.

The module is intended to enable some comparisons between tools or to gain 
insight from a single run.  
This module is designed to be used to help with Tika, but it could be used to 
evaluate other tools as well.

= Background =

= Quick Start Usage =

== Single Output from One Tool (Profile) ==
 1. Create a directory of extract files that mirrors your input directory. 
These files may be UTF-8 text files with '.txt' appended to the original file's 
name or they may be the !RecursiveParserWrapper's '.json' representation from 
tika-app's '-J -t' option.
 
 2. Profile the directory of extracts and create a local H2 database: 
    `java -jar tika-eval.X.Y.jar Profile -extractDir json -db profiledb`
 
 3.#3 Write reports from the database:

    `java -jar tika-eval.X.Y.jar Report -db profiledb`

You'll have a directory of .xlsx reports under the "reports" directory.

== Comparing Output from Two Tools/Settings (Compare) ==

 1. Create two directories of extract files that mirror your input directory. 
These files may be UTF-8 text files with '.txt' appended to the original file's 
name or they may be the !RecursiveParserWrapper's '.json' representation from 
tika-app's '-J -t' option.
 
 2. Compare the extract directory A with extract directory B and write results 
to a local H2 database:
    `java -jar tika-eval.X.Y.jar Profile -extractDirA tika_1_14 -extractDirB 
tika_1_15 -db comparisondb`
 
 3.#3 Write reports from the database:
    `java -jar tika-eval.X.Y.jar Report -db comparisondb`

You'll have a directory of .xlsx reports under the "reports" directory.

== Investigating the Database ==

 1. Fire up the H2 localhost server:
    `java -jar tika-eval.X.Y.jar StartDB' -- this calls java -cp . 
`org.h2.tools.Console -web`
 2.#2 Navigate a browser to {{http://localhost:8082}} and enter the jdbc 
connector code followed by the '''full path''' to your db file:
    `jdbc:h2:/C:/users/someone/mystuff/tika-eval/comparisondb`

If your reaction is: "You call this a database?!", please open tickets and 
contribute to improving the structure.


= More detailed usage =



== Reports ==
The module tika-eval comes with a list of reports.  However, you might want to 
generate your own.  Each report is specified by sql and a few other 
configurations in an xml file.  See `comparison-reports.xml` and 
`profile-reports.xml` to get a sense of the syntax.

To specify your own reports on the commandline, use `-rf` (report file):
    `java -jar tika-eval.X.Y.jar Report -db comparisondb -rf myreports.xml`

If you'd like to write the reports to a root directory other than 'reports', 
specify that with `-rd` (report directory):
    `java -jar tika-eval.X.Y.jar Report -db comparisondb -rd myreportdir`

Reply via email to