[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

Greg Bowyer (JIRA) Thu, 10 Jan 2013 13:26:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550436#comment-13550436
 ]


Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM:
--------------------------------------------------------------

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 
                
      was (Author: [email protected]):
    {quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 
                  
> Native MMapDir
> --------------
>
>                 Key: LUCENE-3178
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3178
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>         Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
> LUCENE-3178-Native-MMap-implementation.patch, 
> LUCENE-3178-Native-MMap-implementation.patch
>
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
> level IO flags depending on the IOContext, we could in theory do something 
> similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the 
> native code would need to invoke mmap (I think?), unlike UnixDir where the 
> code "only" has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

Reply via email to