[ https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514600 ]
Michael McCandless commented on LUCENE-947: ------------------------------------------- Thanks for the review Doron! > 1. TestPerfTasksParse - why do you prevent the testing of parsing of > WriteLineDoc? > I disabled the special handling of this and the test works as supposed. Hmmm ... I was seeing a failure if I didn't do that because WriteLineDoc requires "line.file.out" Config to be set and that test didn't know to do so. I'll put it back into the test but add "line.file.out" for this task. > 2. Documentation of new properties is missing: > - In CreateIndexTask: ram.flush.mb [0], autocommit [true] > - In byTask.package.html (same 2 props). OK, I'll add this and also for "doc.term.vector.{offsets,positions}" to BasicDocMaker. > 3. run.flush & aotoCommit should be added & used & documented also in > openIndexTask (currently only used in createIndexTask). OK, I'll add this. > 4. AddDocTask: flushAtRAMUsage - unused? Yup, this was leftover from pre LUCENE-843 where you had to check RAM usage after each doc and then flush. I'll remove it and actually just revert to current AddDocTask.java (I don't need any mods here). > 5. buil.xml - 1024m as default for running a benchmark seems too much? > I mean, one of the nice things about Lucene is that it can run for you > even if you only have few MB of RAM to spare. For someone with a low level > machine, say 512M only, the JVM might fail to even start, right? Woops... I didn't mean to put this change in. I'll leave it where it was (140 MB) and remove the "-server" jvmarg as well. I was hitting OOM on some Wikipedia algs. > 6. I like your change of factoring some of the field names into consts. We > should probably do the same for the rest. OK I'll pull out the remaining ones... > 7. I didn' t try the new WriteLineDocTask and LineDocMaker feed. Partly > because there was no ready to use alg for that under conf/, and also no test > for that. Do you think we should add at least one of these two (preferably > both)? - I can help with this. OK I'll do both of these. > Some improvements to contrib/benchmark > -------------------------------------- > > Key: LUCENE-947 > URL: https://issues.apache.org/jira/browse/LUCENE-947 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Attachments: LUCENE-947.patch, LUCENE-947.take2.patch > > > I've made some small improvements to the contrib/benchmark, mostly > merging in the ad-hoc benchmarking code I've been using in LUCENE-843: > - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat > - Print the props in sorted order > - Added new config "autocommit=true|false" to CreateIndexTask > - Added new config "ram.flush.mb=int" to AddDocTask > - Added new configs "doc.term.vector.positions=true|false" and > "doc.term.vector.offsets=true|false" to BasicDocMaker > - Added WriteLineDocTask.java, so you can make an alg that uses this > to build up a single file containing one document per line in a > single file. EG this alg converts the reuters-out tree into a > single file that has ~1000 bytes per body field, saved to > work/reuters.1000.txt: > docs.dir=reuters-out > doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker > line.file.out=work/reuters.1000.txt > doc.maker.forever=false > {WriteLineDoc(1000)}: * > Each line has tab-separted TITLE, DATE, BODY fields. > - Created feeds/LineDocMaker.java that creates documents read from > the file created by WriteLineDocTask.java. EG this alg indexes > all documents created above: > analyzer=org.apache.lucene.analysis.SimpleAnalyzer > directory=FSDirectory > doc.add.log.step=500 > docs.file=work/reuters.1000.txt > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > doc.tokenized=true > doc.maker.forever=false > ResetSystemErase > CreateIndex > {AddDoc}: * > CloseIndex > RepSumByPref AddDoc > I'll attach initial patch shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]