[jira] Resolved: (LUCENE-1283) Factor out ByteSliceWriter from DocumentsWriterFieldData

Michael McCandless (JIRA) Sat, 17 May 2008 09:52:18 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless resolved LUCENE-1283.
----------------------------------------

    Resolution: Fixed

> Factor out ByteSliceWriter from DocumentsWriterFieldData
> --------------------------------------------------------
>
>                 Key: LUCENE-1283
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1283
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3, 2.3.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1283.patch
>
>
> DocumentsWriter uses byte slices into shared byte[]'s to hold the
> growing postings data for many different terms in memory.  This is
> probably the trickiest (most confusing) part of DocumentsWriter.
> Right now it's not cleanly factored out and not easy to separately
> test.  In working on this issue:
>   
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200805.mbox/[EMAIL 
> PROTECTED]
> which eventually turned out to be a bug in Oracle JRE's JIT compiler,
> I factored out ByteSliceWriter and created a unit test to stress test
> the writing & reading of byte slices.  The test just randomly writes N
> streams interleaved into shared byte[]'s, then reads them back
> verifying the results are correct.
> I created the stress test to try to find any bugs in that code.  The
> test ran fine (no bugs were found) but I think the refactoring is
> still very much worthwhile.
> I expected the changes to reduce indexing throughput, so I ran a test
> indexing first 200K Wikipedia docs using this alg:
> {code}
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
> docs.file=/Volumes/External/lucene/wiki.txt
> doc.stored = true
> doc.term.vector = true
> doc.add.log.step=2000
> directory=FSDirectory
> autocommit=false
> compound=true
> ram.flush.mb=256
> { "Rounds"
>   ResetSystemErase
>   { "BuildIndex"
>     - CreateIndex
>      { "AddDocs" AddDoc > : 200000
>     - CloseIndex
>   }
>   NewRound
> } : 4
> RepSumByPrefRound BuildIndex
> {code}
> Ok trunk it produces these results:
> {code}
> Operation   round   runCnt   recsPerRun        rec/s  elapsedSec    
> avgUsedMem    avgTotalMem
> BuildIndex      0        1       200000        791.7      252.63   
> 338,552,096  1,061,814,272
> BuildIndex -  - 1 -  -   1 -  -  200000 -  -   793.1 -  - 252.18 - 
> 605,262,080  1,061,814,272
> BuildIndex      2        1       200000        794.8      251.63   
> 601,966,528  1,061,814,272
> BuildIndex -  - 3 -  -   1 -  -  200000 -  -   782.5 -  - 255.58 - 
> 608,699,712  1,061,814,272
> {code}
> and with the patch:
> {code}
> Operation   round   runCnt   recsPerRun        rec/s  elapsedSec    
> avgUsedMem    avgTotalMem
> BuildIndex      0        1       200000        745.0      268.47   
> 338,318,784  1,061,814,272
> BuildIndex -  - 1 -  -   1 -  -  200000 -  -   792.7 -  - 252.30 - 
> 605,331,776  1,061,814,272
> BuildIndex      2        1       200000        786.7      254.24   
> 602,915,712  1,061,814,272
> BuildIndex -  - 3 -  -   1 -  -  200000 -  -   795.3 -  - 251.48 - 
> 602,378,624  1,061,814,272
> {code}
> So it looks like the performance cost of this change is negligible (in
> the noise).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1283) Factor out ByteSliceWriter from DocumentsWriterFieldData

Reply via email to