Michael,

Unsafe code is not necessarily required, it's just an extra squeeze of performance juice I use for myself. If you need to stay away from unsafe code, then it's fine, but by using pointers instead of accessing the array in a managed way, you get a pretty nice performance boost in tight loops.

You can look at the assembler code that the JIT generates for an array lookup vs accessing the same memory location with a pointer, you'll see that it's a bit more efficient the pointer way.

But, like I said, all BitArray needs is a more efficient next set bit implementation, and access to the underlaying memory store it uses (in .NET BitArray's case, an array of ints).

Andrei

Michael Garski wrote:
In 2.3, the document id is checked in the filter after it is scored and
before it is passed to the hit collector, which can result in a poor
performing search executed with a common term and a sparsely populated
filter.  I created my own filter implementation based off of the
DocSet/OpenBitSet classes that are in Solr, where the implementation of
getting the next set bit is very efficient, and does not use unsafe
code.  With my own filter implementation I was also able to work around
the memory leak issue with the cached BitArrays that Digy has noted
earlier.

Filter implementation in Lucene 2.4 is overhauled to allow you to create
your own filter implementation, defaulting to the OpenBitSet.
Additionally, I believe the filter is enumerated along with the
termdocs, leading to faster searches with sparsely populated filters.


Michael


Reply via email to