Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaEval" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaEval New page: = Overview of the 'tika-eval' Module= While not yet available, this page offers a first draft of the documentation for the tika-eval module. The module is intended to enable some comparisons between tools or to gain insight from a single run. This module is designed to be used to help with Tika, but it could be used to evaluate other tools as well. = Background = = Quick Start Usage = == Single Output from One Tool (Profile) == 1. Create a directory of extract files that mirrors your input directory. These files may be UTF-8 text files with '.txt' appended to the original file's name or they may be the !RecursiveParserWrapper's '.json' representation from tika-app's '-J -t' option. 2. Profile the directory of extracts and create a local H2 database: `java -jar tika-eval.X.Y.jar Profile -extractDir json -db profiledb` 3.#3 Write reports from the database: `java -jar tika-eval.X.Y.jar Report -db profiledb` You'll have a directory of .xlsx reports under the "reports" directory. == Comparing Output from Two Tools/Settings (Compare) == 1. Create two directories of extract files that mirror your input directory. These files may be UTF-8 text files with '.txt' appended to the original file's name or they may be the !RecursiveParserWrapper's '.json' representation from tika-app's '-J -t' option. 2. Compare the extract directory A with extract directory B and write results to a local H2 database: `java -jar tika-eval.X.Y.jar Profile -extractDirA tika_1_14 -extractDirB tika_1_15 -db comparisondb` 3.#3 Write reports from the database: `java -jar tika-eval.X.Y.jar Report -db comparisondb` You'll have a directory of .xlsx reports under the "reports" directory. == Investigating the Database == 1. Fire up the H2 localhost server: `java -jar tika-eval.X.Y.jar StartDB' -- this calls java -cp . `org.h2.tools.Console -web` 2.#2 Navigate a browser to {{http://localhost:8082}} and enter the jdbc connector code followed by the '''full path''' to your db file: `jdbc:h2:/C:/users/someone/mystuff/tika-eval/comparisondb` If your reaction is: "You call this a database?!", please open tickets and contribute to improving the structure. = More detailed usage = == Reports == The module tika-eval comes with a list of reports. However, you might want to generate your own. Each report is specified by sql and a few other configurations in an xml file. See `comparison-reports.xml` and `profile-reports.xml` to get a sense of the syntax. To specify your own reports on the commandline, use `-rf` (report file): `java -jar tika-eval.X.Y.jar Report -db comparisondb -rf myreports.xml` If you'd like to write the reports to a root directory other than 'reports', specify that with `-rd` (report directory): `java -jar tika-eval.X.Y.jar Report -db comparisondb -rd myreportdir`
