Performance improvement for Lucene.net with memory mapped files.

Van Den Berghe, Vincent Sat, 25 Feb 2017 13:42:06 -0800

Hello (again),

During performance analysis with an index of 25 million documents and queries 
having 50 or more clauses, a hotspot was spotted (no pun intended) in the 
following ByteBuffer method:


        public virtual ByteBuffer Get(byte[] dst, int offset, int length)
        {
            CheckBounds(offset, length, dst.Length);
            if (length > Remaining)
                throw new BufferUnderflowException();
            int end = offset + length;
            for (int i = offset; i < end; i++)
                dst[i] = Get();

            return this;
        }


This fills a buffer by calling the Get() method tens of millions of times. The 
class MemoryMappedFileByteBuffer, which inherits from ByteBuffer, does the 
following:

        public override byte Get()
        {
            return _accessor.ReadByte(Ix(NextGetIndex()));
        }


This is horribly inefficient, and it shows: internally, the .NET implementation 
will perform millions of validation of the constrained region, followed by 
acquiring the mapped pointer to read a single byte.
By providing MemoryMappedFileByteBuffer with its own implementation:

              public override ByteBuffer Get(byte[] dst, int offset, int length)
              {
                     CheckBounds(offset, length, dst.Length);
                     if (length > Remaining)
                           throw new BufferUnderflowException();
                     _accessor.ReadArray(Ix(NextGetIndex(length)), dst, offset, 
length);
                     return this;
              }

... an increase of a factor 5 or more can be obtained. Startup and query times 
are greatly improved.
Similarly, one can define the corresponding:

              public override ByteBuffer Put(byte[] src, int offset, int length)
              {
                     CheckBounds(offset, length, src.Length);
                     if (length > Remaining)
                           throw new BufferOverflowException();
                     _accessor.WriteArray(Ix(NextPutIndex(length)), src, 
offset, length);
                     return this;
              }


... for a similar improvement in write times, but this was not extensively 
tested.

Do with this information as you please.

Vincent

Performance improvement for Lucene.net with memory mapped files.

Reply via email to