Hello everyone,

I've just had an interesting performance debugging session, and one of the 
things I've learned is probably applicable for Lucene.NET.
I'll give it here with no guarantees, hoping that it might be useful to someone.

Lucene uses memory mapped files for reading, most notably via 
MemoryMappedFileByteBuffer. Profiling indicated that there are 2 calls that 
have quite some overhead:

        public override ByteBuffer Get(byte[] dst, int offset, int length)
        public override byte Get()

These calls spend their time in 2 methods of MemoryMappedViewAccessor:

public int ReadArray<T>(long position, T[] array, int offset, int count) where 
T : struct;
public byte ReadByte(long position);

The implementation of both contains a lot of overhead, especially ReadArray<T>: 
apart from the parameter validation, this method makes sure that the generic 
parameter T is properly aligned. This is irrelevant in our use case, since T is 
byte. But because the method implementation doesn't make any assumptions on T 
(other than the fact that is must be a value type, which is the generic 
constraint), every call goes through the same motions, every time.
Microsoft should have provided specializations for common value types, and 
certainly for byte arrays. Sadly, this is not the case.
The other one, ReadByte, acquires and releases the (unsafe) pointer before 
derefencing it to return one single byte.

A way to do this more efficiently (while avoiding unsafe code), is to acquire 
the pointer handle associated with the view accessor, and use that pointer to 
marshal information back to the caller.
To do this, MemoryMappedFileByteBuffer needs one extra member variable to hold 
the address:

       private long m_Ptr;


Then, the 2 MemoryMappedFileByteBuffer constructors need to be rewritten as 
follows (mainly to avoid code duplication):

              public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor 
accessor, int capacity)
                           : this(accessor, capacity, 0)
              {
              }

              public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor 
accessor, int capacity, int offset)
                     : base(capacity)
              {
                     this.accessor = accessor;
                     this.offset = offset;
                     
System.Runtime.CompilerServices.RuntimeHelpers.PrepareConstrainedRegions();
                     try
                     {
                     }
                     finally
                     {
                           bool success = false;
                           
accessor.SafeMemoryMappedViewHandle.DangerousAddRef(ref success);
                           m_Ptr = 
accessor.SafeMemoryMappedViewHandle.DangerousGetHandle().ToInt64() + 
accessor.PointerOffset;
                     }
              }

The only thing this does is getting the pointer handle. Yes, the method has the 
word "Dangerous" in it, but it's perfectly safe :). Note that this needs .NET 
version 4.5.1 or later, because we want the starting position of the view from 
the beginning of the memory mapped file through the PointerOffset property 
which is unavailable in earlier .NET releases.
What the constructor does is to get a 64-bit quantity representing the start of 
the memory mapped view. The special construct with an "empty try block" 
conforms to the documentation regarding constrained execution regions (although 
I think it's more of a cargo-cult thing, since constrained execution doesn't 
solve a lot of problems in this case).

Finally, the Dispose method needs to be extended to release the pointer handle 
using DangerousRelease:

        public void Dispose()
        {
            if (accessor != null)
            {
              accessor.SafeMemoryMappedViewHandle.DangerousRelease();
              accessor.Dispose();
              accessor = null;
            }
        }

At this point, we can replace the ReadArray in ByteBuffer Get by this:

Marshal.Copy(new IntPtr(m_Ptr + Ix(NextGetIndex(length))), dst, offset, length);

And the ReadByte method becomes:

        public override byte Get()
        {
              return Marshal.ReadByte(new IntPtr(m_Ptr + Ix(NextGetIndex())));
        }


The Marshal class contains various read method to read various data types 
(ReadInt16, ReadInt32), and it would be possible to rewrite all other methods 
that currently assemble the types byte-per-byte. This is left as an exercise 
for the reader. In any case, these methods have a lot less overhead than the 
corresponding methods in the memory view accessor.

In my measurements, even when files reside on slow devices, the performance 
improvements are noticeable: I'm seeing improvements of 5%, especially for 
large segments. If you have slow I/O, the slow I/O still dominates, of course: 
no such thing as a free lunch and all that.

As I said, no guarantees. Have fun with it! If you find something that is 
unacceptable, let me know.


Vincent

Reply via email to