Dawid Weiss created LUCENE-8406:
-----------------------------------

             Summary: Make ByteBufferIndexInput public
                 Key: LUCENE-8406
                 URL: https://issues.apache.org/jira/browse/LUCENE-8406
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Dawid Weiss
            Assignee: Dawid Weiss
             Fix For: 6.7


The logic of handling byte buffers splits, their proper closing (cleaner) and 
all the trickery involved in slicing, cloning and proper exception handling is 
quite daunting. 

While ByteBufferIndexInput.newInstance(..) is public, the parent class 
ByteBufferIndexInput is not. I think we should make the parent class public to 
allow advanced users to make use of this (complex) piece of code to create 
IndexInput based on a sequence of ByteBuffers.

The specific rationale I'm aiming at here is RAMDirectory, which currently uses 
a custom IndexInput implementation, which in turn reaches to RAMFile's 
synchronized methods. This is the cause of quite dramatic congestions on 
multithreaded systems. While we clearly discourage RAMDirectory from being used 
in production environments, there really is no need for it to be slow. If 
modified only slightly (to use ByteBuffer-based input), the performance is on 
par with FSDirectory. Here's a sample log comparing FSDirectory with 
RAMDirectory and the "modified" RAMDirectory making use of the ByteBuffer input:

{code}
14:26:40 INFO  console: FSDirectory index.
14:26:41 INFO  console: Opened with 299943 documents.
14:26:50 INFO  console: Finished: 8.820 s, 240000 matches.

14:26:50 INFO  console: RAMDirectory index.
14:26:50 INFO  console: Opened with 299943 documents.
14:28:50 INFO  console: Finished: 2.012 min, 240000 matches.

14:28:50 INFO  console: RAMDirectory2 index (wrapped byte[] buffers).
14:28:50 INFO  console: Opened with 299943 documents.
14:29:00 INFO  console: Finished: 9.215 s, 240000 matches.

14:29:00 INFO  console: RAMDirectory2 index (direct memory buffers).
14:29:00 INFO  console: Opened with 299943 documents.
14:29:08 INFO  console: Finished: 8.817 s, 240000 matches.
{code}

Note the performance difference is an order of magnitude on this 32-CPU system 
(2 minutes vs. 9 seconds). The tiny performance difference between the 
implementation based on direct memory buffers vs. those acquired via 
ByteBuffer.wrap(byte[]) is due to the fact that direct buffers access their 
data via unsafe and the wrapped counterpart uses regular java array access (my 
best guess).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to