Factor out ByteSliceWriter from DocumentsWriterFieldData --------------------------------------------------------
Key: LUCENE-1283 URL: https://issues.apache.org/jira/browse/LUCENE-1283 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 Attachments: LUCENE-1283.patch DocumentsWriter uses byte slices into shared byte[]'s to hold the growing postings data for many different terms in memory. This is probably the trickiest (most confusing) part of DocumentsWriter. Right now it's not cleanly factored out and not easy to separately test. In working on this issue: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200805.mbox/[EMAIL PROTECTED] which eventually turned out to be a bug in Oracle JRE's JIT compiler, I factored out ByteSliceWriter and created a unit test to stress test the writing & reading of byte slices. The test just randomly writes N streams interleaved into shared byte[]'s, then reads them back verifying the results are correct. I created the stress test to try to find any bugs in that code. The test ran fine (no bugs were found) but I think the refactoring is still very much worthwhile. I expected the changes to reduce indexing throughput, so I ran a test indexing first 200K Wikipedia docs using this alg: {code} analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker docs.file=/Volumes/External/lucene/wiki.txt doc.stored = true doc.term.vector = true doc.add.log.step=2000 directory=FSDirectory autocommit=false compound=true ram.flush.mb=256 { "Rounds" ResetSystemErase { "BuildIndex" - CreateIndex { "AddDocs" AddDoc > : 200000 - CloseIndex } NewRound } : 4 RepSumByPrefRound BuildIndex {code} Ok trunk it produces these results: {code} Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem BuildIndex 0 1 200000 791.7 252.63 338,552,096 1,061,814,272 BuildIndex - - 1 - - 1 - - 200000 - - 793.1 - - 252.18 - 605,262,080 1,061,814,272 BuildIndex 2 1 200000 794.8 251.63 601,966,528 1,061,814,272 BuildIndex - - 3 - - 1 - - 200000 - - 782.5 - - 255.58 - 608,699,712 1,061,814,272 {code} and with the patch: {code} Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem BuildIndex 0 1 200000 745.0 268.47 338,318,784 1,061,814,272 BuildIndex - - 1 - - 1 - - 200000 - - 792.7 - - 252.30 - 605,331,776 1,061,814,272 BuildIndex 2 1 200000 786.7 254.24 602,915,712 1,061,814,272 BuildIndex - - 3 - - 1 - - 200000 - - 795.3 - - 251.48 - 602,378,624 1,061,814,272 {code} So it looks like the performance cost of this change is negligible (in the noise). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]