[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Li updated LUCENE-1035: ---------------------------- Attachment: LUCENE-1035.patch Coding Changes -------------- New classes are localized to the store package and so as most of the changes. - Two new interfaces: BareInput and BufferPool. - BareInput takes a subset of IndexInput's methods such as readBytes (IndexInput now implements BareInput). - BufferPoolLRU is a simple implementation of BufferPool interface. It uses a doubly linked list for the LRU algorithm. - BufferPooledIndexInput is a subclass of BufferedIndexInput. It takes a BareInput and a BufferPool. For BufferedIndexInput's readInternal, it will read from the BufferPool, and BufferPool will read from its cache if it's a hit and read from BareInput if it's a miss. - A FSDirectory object can optionally be created with a BufferPool with its size specified by a buffer size and number of buffers. BufferPool is shared among IndexInput of read-only files in the directory. Unit tests - TestBufferPoolLRU.java is added. - Minor changes were made to _TestHelper.java and TestCompoundFile.java because they made specific assumptions of the type of IndexInput returns by FSDirectory.openInput. - All unit tests pass when I switch to always use a BufferPool. Performance Results ------------------- I ran some experiments using the enwiki dataset. The experiments were run on a dual 2.0Ghz Intel Xeon server running Linux. The dataset has about 3.5M documents and the index built from it is more than 3G. The only store field is a unique docid which is retrieved for each query result. All queries are two-term AND queries generated from the dictionary. The first set of queries returns between 1 to 1000 results with an average of 40. The second set returns between 1 to 3000 with an average of 560. All tests were run warm. 1 Query set with average 40 results Buffer Pool Size Hit Ratio Queries per second 0 N/A 230 16M 55% 250 32M 63% 282 64M 73% 345 128M 85% 476 256M 95% 672 512M 98% 685 2 Query set with average 560 results Buffer Pool Size Hit Ratio Queries per second 0 N/A 27 16M 56% 29 32M 70% 37 64M 89% 55 128M 97% 67 256M 98% 71 512M 99% 72 Of course if the tests are run cold, or if the queried portion of the index is significantly larger than the file system cache, or there are a lot of pre-processing of the queries and/or post-processing of the results, the speedup will be less. But where it applies, i.e. a reasonable hit ratio can be achieved, it should provide a good improvement. > ptional Buffer Pool to Improve Search Performance > ------------------------------------------------- > > Key: LUCENE-1035 > URL: https://issues.apache.org/jira/browse/LUCENE-1035 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Reporter: Ning Li > Attachments: LUCENE-1035.patch > > > Index in RAMDirectory provides better performance over that in FSDirectory. > But many indexes cannot fit in memory or applications cannot afford to > spend that much memory on index. On the other hand, because of locality, > a reasonably sized buffer pool may provide good improvement over FSDirectory. > This issue aims at providing such an optional buffer pool layer. In cases > where it fits, i.e. a reasonable hit ratio can be achieved, it should provide > a good improvement over FSDirectory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]