[
https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614974#action_12614974
]
Michael McCandless commented on LUCENE-753:
-------------------------------------------
I created a large index (indexed Wikipedia 4X times over, with stored
fields & tv offsets/positions = 72 GB). I then randomly sampled 50
terms > 1 million freq, plus 200 terms > 100,000 freq plus 100 terms >
10,000 freq plus 100 terms > 1000 freq. Then I warmed the OS so these
queries are fully cached in the IO cache.
It's a highly synthetic test. I'd really love to test on real
queries, instead of single term queries.
Then I ran this alg:
{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
query.maker = org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker
file.query.maker.file = /lucene/wikiQueries.txt
directory=FSDirectory
pool=true
work.dir=/lucene/bigwork
OpenReader
{ "Warmup" SearchTrav(20) > : 5
{ "Rounds"
[{ "Search" Search > : 500]: 16
NewRound
}: 2
CloseReader
RepSumByPrefRound Search
{code}
I ran with 2, 4, 8 and 16 threads, on a Intel quad Mac Pro (2 cpus,
each dual core) OS X 10.5.4, with 6 GB RAM, Sun JRE 1.6.0_05 and a
single WD Velociraptor hard drive. To keep the number of searches
constant I changed the 500 count above to match (ie with 8 threads I
changed 500 -> 1000, 4 threads I changed it to 2000, etc.).
Here're the results -- each run is best of 2, and all searches are
fully cached in OS's IO cache:
||Number of Threads||Patch rec/s||Trunk rec/s||Pctg gain||
|2|78.7|74.9|5.1%|
|4|74.1|68.2|8.7%|
|8|37.7|32.7|15.3%|
|16|19.2|16.3|17.8%|
I also ran the same alg, replacing Search task with SearchTravRet(10)
(retrieves the first 10 docs (hits) of each search), first warming so
it's all fully cached:
||Number of Threads||Patch rec/s||Trunk rec/s||Pctg gain||
|2|1589.6|1519.8|4.6%|
|4|1460.9|1395.3|4.7%|
|8|748.9|676.0|10.8%|
|16|382.7|338.4|13.1%|
So there are smallish gains, but rememember these are upper bounds on
the gains because no pooling is happening. I'll test uncached next.
> Use NIO positional read to avoid synchronization in FSIndexInput
> ----------------------------------------------------------------
>
> Key: LUCENE-753
> URL: https://issues.apache.org/jira/browse/LUCENE-753
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Store
> Reporter: Yonik Seeley
> Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java,
> FileReadTest.java, FileReadTest.java, FileReadTest.java, FileReadTest.java,
> FSDirectoryPool.patch, FSIndexInput.patch, FSIndexInput.patch,
> lucene-753.patch, lucene-753.patch
>
>
> As suggested by Doug, we could use NIO pread to avoid synchronization on the
> underlying file.
> This could mitigate any MT performance drop caused by reducing the number of
> files in the index format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]