[jira] Commented: (LUCENE-753) Use NIO positional read to avoid synchronization in FSIndexInput

Michael McCandless (JIRA) Sat, 19 Jul 2008 03:31:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614974#action_12614974
 ]


Michael McCandless commented on LUCENE-753:
-------------------------------------------

I created a large index (indexed Wikipedia 4X times over, with stored
fields & tv offsets/positions = 72 GB).  I then randomly sampled 50
terms > 1 million freq, plus 200 terms > 100,000 freq plus 100 terms >
10,000 freq plus 100 terms > 1000 freq.  Then I warmed the OS so these
queries are fully cached in the IO cache.

It's a highly synthetic test.  I'd really love to test on real
queries, instead of single term queries.

Then I ran this alg:

{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

query.maker = org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker
file.query.maker.file = /lucene/wikiQueries.txt

directory=FSDirectory
pool=true

work.dir=/lucene/bigwork

OpenReader

{ "Warmup" SearchTrav(20) > : 5

{ "Rounds"
  [{ "Search" Search > : 500]: 16
  NewRound
}: 2

CloseReader 

RepSumByPrefRound Search
{code}

I ran with 2, 4, 8 and 16 threads, on a Intel quad Mac Pro (2 cpus,
each dual core) OS X 10.5.4, with 6 GB RAM, Sun JRE 1.6.0_05 and a
single WD Velociraptor hard drive.  To keep the number of searches
constant I changed the 500 count above to match (ie with 8 threads I
changed 500 -> 1000, 4 threads I changed it to 2000, etc.).

Here're the results -- each run is best of 2, and all searches are
fully cached in OS's IO cache:

||Number of Threads||Patch rec/s||Trunk rec/s||Pctg gain||
|2|78.7|74.9|5.1%|
|4|74.1|68.2|8.7%|
|8|37.7|32.7|15.3%|
|16|19.2|16.3|17.8%|

I also ran the same alg, replacing Search task with SearchTravRet(10)
(retrieves the first 10 docs (hits) of each search), first warming so
it's all fully cached:

||Number of Threads||Patch rec/s||Trunk rec/s||Pctg gain||
|2|1589.6|1519.8|4.6%|
|4|1460.9|1395.3|4.7%|
|8|748.9|676.0|10.8%|
|16|382.7|338.4|13.1%|

So there are smallish gains, but rememember these are upper bounds on
the gains because no pooling is happening.  I'll test uncached next.


> Use NIO positional read to avoid synchronization in FSIndexInput
> ----------------------------------------------------------------
>
>                 Key: LUCENE-753
>                 URL: https://issues.apache.org/jira/browse/LUCENE-753
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>            Reporter: Yonik Seeley
>         Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java, 
> FileReadTest.java, FileReadTest.java, FileReadTest.java, FileReadTest.java, 
> FSDirectoryPool.patch, FSIndexInput.patch, FSIndexInput.patch, 
> lucene-753.patch, lucene-753.patch
>
>
> As suggested by Doug, we could use NIO pread to avoid synchronization on the 
> underlying file.
> This could mitigate any MT performance drop caused by reducing the number of 
> files in the index format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-753) Use NIO positional read to avoid synchronization in FSIndexInput

Reply via email to