[
https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Li updated LUCENE-1035:
----------------------------
Attachment: LUCENE-1035.patch
Coding Changes
--------------
New classes are localized to the store package and so as most of the changes.
- Two new interfaces: BareInput and BufferPool.
- BareInput takes a subset of IndexInput's methods such as readBytes
(IndexInput now implements BareInput).
- BufferPoolLRU is a simple implementation of BufferPool interface.
It uses a doubly linked list for the LRU algorithm.
- BufferPooledIndexInput is a subclass of BufferedIndexInput. It takes
a BareInput and a BufferPool. For BufferedIndexInput's readInternal,
it will read from the BufferPool, and BufferPool will read from its
cache if it's a hit and read from BareInput if it's a miss.
- A FSDirectory object can optionally be created with a BufferPool with
its size specified by a buffer size and number of buffers. BufferPool
is shared among IndexInput of read-only files in the directory.
Unit tests
- TestBufferPoolLRU.java is added.
- Minor changes were made to _TestHelper.java and TestCompoundFile.java
because they made specific assumptions of the type of IndexInput returns
by FSDirectory.openInput.
- All unit tests pass when I switch to always use a BufferPool.
Performance Results
-------------------
I ran some experiments using the enwiki dataset. The experiments were run on
a dual 2.0Ghz Intel Xeon server running Linux. The dataset has about 3.5M
documents and the index built from it is more than 3G. The only store field
is a unique docid which is retrieved for each query result. All queries are
two-term AND queries generated from the dictionary. The first set of queries
returns between 1 to 1000 results with an average of 40. The second set
returns between 1 to 3000 with an average of 560. All tests were run warm.
1 Query set with average 40 results
Buffer Pool Size Hit Ratio Queries per second
0 N/A 230
16M 55% 250
32M 63% 282
64M 73% 345
128M 85% 476
256M 95% 672
512M 98% 685
2 Query set with average 560 results
Buffer Pool Size Hit Ratio Queries per second
0 N/A 27
16M 56% 29
32M 70% 37
64M 89% 55
128M 97% 67
256M 98% 71
512M 99% 72
Of course if the tests are run cold, or if the queried portion of the index
is significantly larger than the file system cache, or there are a lot of
pre-processing of the queries and/or post-processing of the results, the
speedup will be less. But where it applies, i.e. a reasonable hit ratio can
be achieved, it should provide a good improvement.
> ptional Buffer Pool to Improve Search Performance
> -------------------------------------------------
>
> Key: LUCENE-1035
> URL: https://issues.apache.org/jira/browse/LUCENE-1035
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Store
> Reporter: Ning Li
> Attachments: LUCENE-1035.patch
>
>
> Index in RAMDirectory provides better performance over that in FSDirectory.
> But many indexes cannot fit in memory or applications cannot afford to
> spend that much memory on index. On the other hand, because of locality,
> a reasonably sized buffer pool may provide good improvement over FSDirectory.
> This issue aims at providing such an optional buffer pool layer. In cases
> where it fits, i.e. a reasonable hit ratio can be achieved, it should provide
> a good improvement over FSDirectory.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]