Hello (again),
During performance analysis with an index of 25 million documents and queries
having 50 or more clauses, a hotspot was spotted (no pun intended) in the
following ByteBuffer method:
public virtual ByteBuffer Get(byte[] dst, int offset, int length)
{
CheckBounds(offset, length, dst.Length);
if (length > Remaining)
throw new BufferUnderflowException();
int end = offset + length;
for (int i = offset; i < end; i++)
dst[i] = Get();
return this;
}
This fills a buffer by calling the Get() method tens of millions of times. The
class MemoryMappedFileByteBuffer, which inherits from ByteBuffer, does the
following:
public override byte Get()
{
return _accessor.ReadByte(Ix(NextGetIndex()));
}
This is horribly inefficient, and it shows: internally, the .NET implementation
will perform millions of validation of the constrained region, followed by
acquiring the mapped pointer to read a single byte.
By providing MemoryMappedFileByteBuffer with its own implementation:
public override ByteBuffer Get(byte[] dst, int offset, int length)
{
CheckBounds(offset, length, dst.Length);
if (length > Remaining)
throw new BufferUnderflowException();
_accessor.ReadArray(Ix(NextGetIndex(length)), dst, offset,
length);
return this;
}
... an increase of a factor 5 or more can be obtained. Startup and query times
are greatly improved.
Similarly, one can define the corresponding:
public override ByteBuffer Put(byte[] src, int offset, int length)
{
CheckBounds(offset, length, src.Length);
if (length > Remaining)
throw new BufferOverflowException();
_accessor.WriteArray(Ix(NextPutIndex(length)), src,
offset, length);
return this;
}
... for a similar improvement in write times, but this was not extensively
tested.
Do with this information as you please.
Vincent