[jira] [Commented] (LUCENE-5722) Speed up MMapDirectory.seek()

Uwe Schindler (JIRA) Sun, 01 Jun 2014 09:13:11 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015012#comment-14015012
 ]


Uwe Schindler commented on LUCENE-5722:
---------------------------------------

I looked at the code very long time, also at Roberts patch.

I found out: the subclassing issue can be solved quite easily: We dont need to 
make ByteBufferIndexInput abstract, the solution would be to pass some 
"unmapper" instance to the constructor that does the unmapping, so freeBuffers 
does not need to be abstract. In that case we can use ByteBufferIndexInput as 
concrete class.

The second thing that is an issue in MultiMmap-Seek is the problem with the 
offset. The offset is in ByteBufferIndexInput only used in seek and when 
creating slices/clones. The idea is now, to completely remove the offset from 
the base class. The base class is useable for the case when offset=0 and 
multiple buffers are used. The whole chekcs at the beginning of seek() are then 
useless, because they only apply for the case offset=0. In all other cases we 
already catch the out-of-bounds cases by AIOOBE and similar.

The special cases would then be:
- SingleByteBufferIndexInput extends ByteBufferIndexInput: we can remove the 
assert, because offset no longer exists in this base class. We always use 
ByteBuffer.slice here.
- The other special case is offset!=0 for multi-mmap: In that case we have a 
second concreate subclass, that just overrides seek() to do the offset checks 
at the beginning and if all is adjusted call super.seek().

The cloning/slicing can be done much easier and we just include the offset here.

Furthermore, I made a small improvement to the ByteBufferIndexInput.seek() for 
the case if seeking happens inside the same buffer. With the optimizations 
above the whole thing is then mostly a simple position() call on the byte 
buffer with a few calculations.

I will resort all this stuff an provide a patch!

> Speed up MMapDirectory.seek()
> -----------------------------
>
>                 Key: LUCENE-5722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5722
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-5722.patch
>
>
> For traditional lucene access which is mostly sequential, occasional 
> advance(), I think this method gets drowned out in noise.
> But for access like docvalues, its important. Unfortunately seek() is complex 
> today because of mapping multiple buffers.
> However, the very common case is that only one map is used for a given clone 
> or slice.
> When there is the possibility to use only a single mapped buffer, we should 
> instead take advantage of ByteBuffer.slice(), which will adjust the internal 
> mmap address and remove the offset calculation. furthermore we don't need the 
> shift/mask or even the negative check, as they are then all handled with the 
> ByteBuffer api: seek is a one-liner (with try/catch of course to convert 
> exceptions).
> This makes docvalues access 20% faster, I havent tested conjunctions or 
> anyhting like that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5722) Speed up MMapDirectory.seek()

Reply via email to