Some improvements to contrib/benchmark
--------------------------------------

                 Key: LUCENE-947
                 URL: https://issues.apache.org/jira/browse/LUCENE-947
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/benchmark
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor


I've made some small improvements to the contrib/benchmark, mostly
merging in the ad-hoc benchmarking code I've been using in LUCENE-843:

  - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat

  - Print the props in sorted order

  - Added new config "autocommit=true|false" to CreateIndexTask

  - Added new config "ram.flush.mb=int" to AddDocTask

  - Added new configs "doc.term.vector.positions=true|false" and
    "doc.term.vector.offsets=true|false" to BasicDocMaker

  - Added WriteLineDocTask.java, so you can make an alg that uses this
    to build up a single file containing one document per line in a
    single file.  EG this alg converts the reuters-out tree into a
    single file that has ~1000 bytes per body field, saved to
    work/reuters.1000.txt:

      docs.dir=reuters-out
      doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
      line.file.out=work/reuters.1000.txt
      doc.maker.forever=false
      {WriteLineDoc(1000)}: *

    Each line has tab-separted TITLE, DATE, BODY fields.

  - Created feeds/LineDocMaker.java that creates documents read from
    the file created by WriteLineDocTask.java.  EG this alg indexes
    all documents created above:

      analyzer=org.apache.lucene.analysis.SimpleAnalyzer
      directory=FSDirectory
      doc.add.log.step=500

      docs.file=work/reuters.1000.txt
      doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
      doc.tokenized=true
      doc.maker.forever=false

      ResetSystemErase
      CreateIndex
      {AddDoc}: *
      CloseIndex

      RepSumByPref AddDoc

I'll attach initial patch shortly.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to