[jira] Updated: (LUCENE-1035) ptional Buffer Pool to Improve Search Performance

Ning Li (JIRA) Thu, 25 Oct 2007 18:34:15 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ning Li updated LUCENE-1035:
----------------------------

    Attachment: LUCENE-1035.patch

Coding Changes
--------------
New classes are localized to the store package and so as most of the changes.
  - Two new interfaces: BareInput and BufferPool.
  - BareInput takes a subset of IndexInput's methods such as readBytes
    (IndexInput now implements BareInput).
  - BufferPoolLRU is a simple implementation of BufferPool interface.
    It uses a doubly linked list for the LRU algorithm.
  - BufferPooledIndexInput is a subclass of BufferedIndexInput. It takes
    a BareInput and a BufferPool. For BufferedIndexInput's readInternal,
    it will read from the BufferPool, and BufferPool will read from its
    cache if it's a hit and read from BareInput if it's a miss.
  - A FSDirectory object can optionally be created with a BufferPool with
    its size specified by a buffer size and number of buffers. BufferPool
    is shared among IndexInput of read-only files in the directory.

Unit tests
  - TestBufferPoolLRU.java is added.
  - Minor changes were made to _TestHelper.java and TestCompoundFile.java
    because they made specific assumptions of the type of IndexInput returns
    by FSDirectory.openInput.
  - All unit tests pass when I switch to always use a BufferPool.


Performance Results
-------------------
I ran some experiments using the enwiki dataset. The experiments were run on
a dual 2.0Ghz Intel Xeon server running Linux. The dataset has about 3.5M
documents and the index built from it is more than 3G. The only store field
is a unique docid which is retrieved for each query result. All queries are
two-term AND queries generated from the dictionary. The first set of queries
returns between 1 to 1000 results with an average of 40. The second set
returns between 1 to 3000 with an average of 560. All tests were run warm.

1 Query set with average 40 results
  Buffer Pool Size    Hit Ratio    Queries per second
      0                 N/A            230
      16M               55%            250
      32M               63%            282
      64M               73%            345
      128M              85%            476
      256M              95%            672
      512M              98%            685

2 Query set with average 560 results
  Buffer Pool Size    Hit Ratio    Queries per second
      0                 N/A             27
      16M               56%             29
      32M               70%             37
      64M               89%             55
      128M              97%             67
      256M              98%             71
      512M              99%             72

Of course if the tests are run cold, or if the queried portion of the index
is significantly larger than the file system cache, or there are a lot of
pre-processing of the queries and/or post-processing of the results, the
speedup will be less. But where it applies, i.e. a reasonable hit ratio can
be achieved, it should provide a good improvement.


> ptional Buffer Pool to Improve Search Performance
> -------------------------------------------------
>
>                 Key: LUCENE-1035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1035
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Ning Li
>         Attachments: LUCENE-1035.patch
>
>
> Index in RAMDirectory provides better performance over that in FSDirectory.
> But many indexes cannot fit in memory or applications cannot afford to
> spend that much memory on index. On the other hand, because of locality,
> a reasonably sized buffer pool may provide good improvement over FSDirectory.
> This issue aims at providing such an optional buffer pool layer. In cases
> where it fits, i.e. a reasonable hit ratio can be achieved, it should provide
> a good improvement over FSDirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1035) ptional Buffer Pool to Improve Search Performance

Reply via email to