[ 
https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230287#comment-15230287
 ] 

Tim Allison commented on TIKA-1332:
-----------------------------------

I gave up on that, and we're now using httpd.

The eval code currently exists as commandline calls.  I'm using h2 as the 
backend database, which appears to be compatible with ASL 2.0.  As with all 
development cycles, I started with a flat file, moved to an unfortunately 
complex db structure and will probably have to move to nosql if we want this to 
scale...but not yet...

As above, there are two modes.
1) Profile a single run
   a) run tika-app on a directory of files, output with -J -t (Json 
representation of List<Metadata> with text as the content)
   b) run the profiling code, which populates an h2 db
   c) run xml-configured reports db

2) Compare two runs
  a) run two versions of tika-app on a directory of files
  b) run the comparison code, which populates an h2 db
  c) run xml-configured reports against the db

I've pretty much given up on the notion of automatic testing.  A human has to 
look at the reports and make sense of them.

Given the feedback I received at ApacheCon (egads, a year ago), I think I'd 
like to transition this code into Tika for 1.14.

When the code is ready for review, I'll let y'all know.  Any and all feedback 
on the reports to date would be great.


> Create "eval" code
> ------------------
>
>                 Key: TIKA-1332
>                 URL: https://issues.apache.org/jira/browse/TIKA-1332
>             Project: Tika
>          Issue Type: Sub-task
>          Components: cli, general, server
>            Reporter: Tim Allison
>
> For this issue, we can start with code to gather statistics on each run (# of 
> exceptions per file type, most common exceptions per file type, number of 
> metadata items, total text extracted, etc).  We should also be able to 
> compare one run against another.  Going forward, there's plenty of room to 
> improve.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to