Configurable HTML Parser, external classes to path, exhasutive doc maker
------------------------------------------------------------------------

                 Key: LUCENE-849
                 URL: https://issues.apache.org/jira/browse/LUCENE-849
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/benchmark
            Reporter: Doron Cohen
         Assigned To: Doron Cohen
            Priority: Minor


"doc making" enhancements:

1. Allow configurable html parser, with a new html.parser property.
Currently TrecDocMaker is using the Demo html parser. With this new property 
this can be overriden.

2. allow to add external class path, so the bechmark can be used with modified 
makers/parsers without having to add code to Lucene.
Run benchmark with e.g. "ant run-task 
-Dbenchmark.ext.classpath=/myproj/myclasses"

3. allow to crawl a doc maker until exhausting all its files/docs once, without 
having to know in advance how many docs it can make. 
This can be useful for instance if the input data is in zip files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to