[
https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514494
]
Doron Cohen commented on LUCENE-947:
------------------------------------
I missed that one, thanks for the reminder - just a few comments:
1. TestPerfTasksParse - why do you prevent the testing of parsing of
WriteLineDoc?
I disabled the special handling of this and the test works as supposed.
2. Documentation of new properties is missing:
- In CreateIndexTask: ram.flush.mb [0], autocommit [true]
- In byTask.package.html (same 2 props).
3. run.flush & aotoCommit should be added & used & documented also in
openIndexTask (currently only used in createIndexTask).
4. AddDocTask: flushAtRAMUsage - unused?
5. buil.xml - 1024m as default for running a benchmark seems too much?
I mean, one of the nice things about Lucene is that it can run for you even
if you only have few MB of RAM to spare. For someone with a low level machine,
say 512M only, the JVM might fail to even start, right?
6. I like your change of factoring some of the field names into consts. We
should probably do the same for the rest.
7. I didn' t try the new WriteLineDocTask and LineDocMaker feed. Partly because
there was no ready to use alg for that under conf/, and also no test for that.
Do you think we should add at least one of these two (preferably both)? - I
can help with this.
> Some improvements to contrib/benchmark
> --------------------------------------
>
> Key: LUCENE-947
> URL: https://issues.apache.org/jira/browse/LUCENE-947
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-947.patch, LUCENE-947.take2.patch
>
>
> I've made some small improvements to the contrib/benchmark, mostly
> merging in the ad-hoc benchmarking code I've been using in LUCENE-843:
> - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat
> - Print the props in sorted order
> - Added new config "autocommit=true|false" to CreateIndexTask
> - Added new config "ram.flush.mb=int" to AddDocTask
> - Added new configs "doc.term.vector.positions=true|false" and
> "doc.term.vector.offsets=true|false" to BasicDocMaker
> - Added WriteLineDocTask.java, so you can make an alg that uses this
> to build up a single file containing one document per line in a
> single file. EG this alg converts the reuters-out tree into a
> single file that has ~1000 bytes per body field, saved to
> work/reuters.1000.txt:
> docs.dir=reuters-out
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
> line.file.out=work/reuters.1000.txt
> doc.maker.forever=false
> {WriteLineDoc(1000)}: *
> Each line has tab-separted TITLE, DATE, BODY fields.
> - Created feeds/LineDocMaker.java that creates documents read from
> the file created by WriteLineDocTask.java. EG this alg indexes
> all documents created above:
> analyzer=org.apache.lucene.analysis.SimpleAnalyzer
> directory=FSDirectory
> doc.add.log.step=500
> docs.file=work/reuters.1000.txt
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
> doc.tokenized=true
> doc.maker.forever=false
> ResetSystemErase
> CreateIndex
> {AddDoc}: *
> CloseIndex
> RepSumByPref AddDoc
> I'll attach initial patch shortly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]