Hello all,

This is my first message here. I've been using Lucene.NET for a while for a pretty big local search engine website and thought I'd share some of the performance improvements I came across.

I made lots of changes to the Lucene code base in the process, and had to write my own geographical based filters and sort classes.

I got it to a point where it can do a search through 17,000,000 records in under 200 ms, all this while also filtering results so they are within specified coordinates (each listing having its own coordinates), and also sorting results by distance to a certain point of origin - all on one machine. And I'm pretty sure it can be made even faster.

I'd like to share one of the easy and pretty noticeable performance improvements I did.

Lucene.NET, in its current implementation, makes heavy use of the .NET BitArray class. This class is not really performance optimized at all (you can look through the sources from Microsoft).

The problem is that Microsoft sealed the BitArray class so you can't derive from it, so I got the sources from Microsoft and then heavily optimized them for performance using pointers (unsafe code), and some improved algorithms for finding the Next Set Bit (which I use extensively in my own Geo-filters).

Also, if you can remove some of the valid argument checks from the BitArray class, then that's all the better for performance as well, because methods that throw explicit errors are not inlined by the .NET JIT.

The other big improvement I did was to keep a renewable pool of BitArray classes in memory in order to minimize garbage collection. What this means is, whenever I need a new BitArray, instead of creating a completely new instance that the GC has to manage, I get one from a pool of 'released' BitArrays. When I'm done with a BitArray, instead of letting it go to the GC, I just put it back in the pool so it can be reused. This reduced GCs to a minimum in my application, and the memory footprint of my application is also much more stable.

I realize however that this second improvement would break the Java code compatibility so it's just something more advanced users would want to do for themselves.

I'm not sure if attaching files here works, but I'll attach the performance BitArray class I have.

Thanks for all your hard work everyone!

Kind regards,
Andrei Alecu


// ==++== 
//
//   Copyright (c) Microsoft Corporation.  All rights reserved.
//
// ==--== 
/*==============================================================================
** 
** Class: BitArrayEx 
**
** 
** Purpose: The BitArrayEx class manages a compact array of bit values.
**
**
=============================================================================*/
namespace System.Collections
{

    using System;
    using System.Security.Permissions;
    using System.Collections.Specialized;
    using System.Threading;
    // A vector of bits.  Use this to store bits efficiently, without having to 
do bit 
    // shifting yourself.
    [System.Runtime.InteropServices.ComVisible(true)]
    [Serializable()]
    public sealed unsafe class BitArrayEx : ICollection, ICloneable
    {
        private BitArrayEx()
        {
        }

        
/*========================================================================= 
        ** Allocates space to hold length bit values. All of the values in the 
bit
        ** array are set to false. 
        **
        ** Exceptions: ArgumentException if length < 0.
        
=========================================================================*/
        public BitArrayEx(int length)
            : this(length, false)
        {
        }

        
/*=========================================================================
        ** Allocates space to hold length bit values. All of the values in the 
bit 
        ** array are set to defaultValue.
        **
        ** Exceptions: ArgumentOutOfRangeException if length < 0.
        
=========================================================================*/
        public BitArrayEx(int length, bool defaultValue)
        {
            if (length < 0)
            {
                throw new ArgumentOutOfRangeException("length");
            }

            m_array = new int[(length + 31) / 32];
            fixed (int * arr = m_array)
            {
                this.unsafe_array = arr;
            }
            m_length = length;
            
            int fillValue = defaultValue ? unchecked(((int)0xffffffff)) : 0;
            for (int i = 0; i < m_array.Length; i++)
            {
                m_array[i] = fillValue;
            }

            _version = 0;
        }

        public BitArrayEx ToBitArray()
        {
            var ret = new BitArrayEx(m_array);
            ret.Length = this.Length;
            return ret;
        }

        
/*=========================================================================
        ** Allocates space to hold the bit values in bytes. bytes[0] represents 
        ** bits 0 - 7, bytes[1] represents bits 8 - 15, etc. The LSB of each 
byte
        ** represents the lowest index value; bytes[0] & 1 represents bit 0, 
        ** bytes[0] & 2 represents bit 1, bytes[0] & 4 represents bit 2, etc. 
        **
        ** Exceptions: ArgumentException if bytes == null. 
        
=========================================================================*/
        public BitArrayEx(byte[] bytes)
        {
            if (bytes == null)
            {
                throw new ArgumentNullException("bytes");
            }

            m_array = new int[(bytes.Length + 3) / 4];
            m_length = bytes.Length * 8;
            fixed (int* arr = m_array)
            {
                this.unsafe_array = arr;
            }
            int i = 0;
            int j = 0;
            while (bytes.Length - j >= 4)
            {
                m_array[i++] = (bytes[j] & 0xff) |
                              ((bytes[j + 1] & 0xff) << 8) |
                              ((bytes[j + 2] & 0xff) << 16) |
                              ((bytes[j + 3] & 0xff) << 24);
                j += 4;
            }

            //BCLDebug.Assert(bytes.Length - j >= 0, "BitArrayEx byteLength 
problem");
            //BCLDebug.Assert(bytes.Length - j < 4, "BitArrayEx byteLength 
problem #2");

            switch (bytes.Length - j)
            {
                case 3:
                    m_array[i] = ((bytes[j + 2] & 0xff) << 16);
                    goto case 2;
                // fall through 
                case 2:
                    m_array[i] |= ((bytes[j + 1] & 0xff) << 8);
                    goto case 1;
                // fall through 
                case 1:
                    m_array[i] |= (bytes[j] & 0xff);
                    break;
            }

            _version = 0;
        }

        public BitArrayEx(bool[] values)
        {
            if (values == null)
            {
                throw new ArgumentNullException("values");
            }

            m_array = new int[(values.Length + 31) / 32];
            m_length = values.Length;
            fixed (int* arr = m_array)
            {
                this.unsafe_array = arr;
            }
            for (int i = 0; i < values.Length; i++)
            {
                if (values[i])
                    m_array[i / 32] |= (1 << (i % 32));
            }

            _version = 0;

        }

        
/*==========================================================================
        ** Allocates space to hold the bit values in values. values[0] 
represents 
        ** bits 0 - 31, values[1] represents bits 32 - 63, etc. The LSB of each
        ** integer represents the lowest index value; values[0] & 1 represents 
bit 
        ** 0, values[0] & 2 represents bit 1, values[0] & 4 represents bit 2, 
etc. 
        **
        ** Exceptions: ArgumentException if values == null. 
        
=========================================================================*/
        public BitArrayEx(int[] values)
        {
            if (values == null)
            {
                throw new ArgumentNullException("values");
            }

            m_array = new int[values.Length];
            m_length = values.Length * 32;
            fixed (int* arr = m_array)
            {
                this.unsafe_array = arr;
            }
            Array.Copy(values, m_array, values.Length);

            _version = 0;
        }

        
/*========================================================================= 
        ** Allocates a new BitArrayEx with the same length and bit values as 
bits. 
        **
        ** Exceptions: ArgumentException if bits == null. 
        
=========================================================================*/
        public BitArrayEx(BitArrayEx bits)
        {
            if (bits == null)
            {
                throw new ArgumentNullException("bits");
            }

            m_array = new int[(bits.m_length + 31) / 32];
            m_length = bits.m_length;
            fixed (int* arr = m_array)
            {
                this.unsafe_array = arr;
            }
            Array.Copy(bits.m_array, m_array, (bits.m_length + 31) / 32);

            _version = bits._version;
        }

        public bool this[int index]
        {
            get
            {
                return Get(index);
            }
            set
            {
                Set(index, value);
            }
        }

        
/*========================================================================== 
        ** Returns the bit value at position index. 
        **
        ** Exceptions: ArgumentOutOfRangeException if index < 0 or 
        **             index >= GetLength().
        
=========================================================================*/
        public bool Get(int index)
        {
            //if (index < 0 || index >= m_length)
            //{
            //    throw new ArgumentOutOfRangeException("index");
            //}

            return (unsafe_array[index / 32] & (1 << (index % 32))) != 0;
        }

        
/*==========================================================================
        ** Sets the bit value at position index to value.
        ** 
        ** Exceptions: ArgumentOutOfRangeException if index < 0 or
        **             index >= GetLength(). 
        
=========================================================================*/
        public void Set(int index, bool value)
        {
            if (index < 0 || index >= m_length)
            {
                throw new ArgumentOutOfRangeException("index");
            }

            if (value)
            {
                m_array[index / 32] |= (1 << (index % 32));
            }
            else
            {
                m_array[index / 32] &= ~(1 << (index % 32));
            }

            _version++;
        }
  

        //faster version
        public void SetFalse(int index)
        {
            m_array[index / 32] &= ~(1 << (index % 32));
        }

        //faster version
        public void SetTrue(int index)
        {
            m_array[index / 32] |= (1 << (index % 32));
        }

        
/*========================================================================= 
        ** Sets all the bit values to value.
        
=========================================================================*/
        public void SetAll(bool value)
        {
            int fillValue = value ? unchecked(((int)0xffffffff)) : 0;
            int ints = (m_length + 31) / 32;
            for (int i = 0; i < ints; i++)
            {
                unsafe_array[i] = fillValue;
            }

            _version++;
        }

        
/*==========================================================================
        ** Returns a reference to the current instance ANDed with value. 
        **
        ** Exceptions: ArgumentException if value == null or
        **             value.Length != this.Length.
        
=========================================================================*/
        public BitArrayEx And(BitArrayEx value)
        {
            if (value == null)
                throw new ArgumentNullException("value");
            if (m_length != value.m_length)
                throw new ArgumentException();

            int ints = (m_length + 31) / 32;
            for (int i = 0; i < ints; i++)
            {
                m_array[i] &= value.m_array[i];
            }

            _version++;
            return this;
        }

        
/*=========================================================================
        ** Returns a reference to the current instance ORed with value.
        ** 
        ** Exceptions: ArgumentException if value == null or
        **             value.Length != this.Length. 
        
=========================================================================*/
        public BitArrayEx Or(BitArrayEx value)
        {
            if (value == null)
                throw new ArgumentNullException("value");
            if (m_length != value.m_length)
                throw new ArgumentException();

            int ints = (m_length + 31) / 32;
            for (int i = 0; i < ints; i++)
            {
                m_array[i] |= value.m_array[i];
            }

            _version++;
            return this;
        }

        
/*=========================================================================
        ** Returns a reference to the current instance XORed with value. 
        ** 
        ** Exceptions: ArgumentException if value == null or
        **             value.Length != this.Length. 
        
=========================================================================*/
        public BitArrayEx Xor(BitArrayEx value)
        {
            if (value == null)
                throw new ArgumentNullException("value");
            if (m_length != value.m_length)
                throw new ArgumentException();

            int ints = (m_length + 31) / 32;
            for (int i = 0; i < ints; i++)
            {
                m_array[i] ^= value.m_array[i];
            }

            _version++;
            return this;
        }

        
/*=========================================================================
        ** Inverts all the bit values. On/true bit values are converted to 
        ** off/false. Off/false bit values are turned on/true. The current 
instance
        ** is updated and returned.
        
=========================================================================*/
        public BitArrayEx Not()
        {
            int ints = (m_length + 31) / 32;
            for (int i = 0; i < ints; i++)
            {
                m_array[i] = ~m_array[i];
            }

            _version++;
            return this;
        }


        public int Length
        {
            get { return m_length; }
            set
            {
                if (value < 0)
                {
                    throw new ArgumentOutOfRangeException("value");
                }

                int newints = (value + 31) / 32;
                if (newints > m_array.Length || newints + _ShrinkThreshold < 
m_array.Length)
                {
                    // grow or shrink (if wasting more than _ShrinkThreshold 
ints) 
                    int[] newarray = new int[newints];
                    Array.Copy(m_array, newarray, newints > m_array.Length ? 
m_array.Length : newints);
                    m_array = newarray;
                    fixed (int* arr = m_array)
                    {
                        this.unsafe_array = arr;
                    }
                }

                if (value > m_length)
                {
                    // clear high bit values in the last int 
                    int last = ((m_length + 31) / 32) - 1;
                    int bits = m_length % 32;
                    if (bits > 0)
                    {
                        m_array[last] &= (1 << bits) - 1;
                    }

                    // clear remaining int values
                    Array.Clear(m_array, last + 1, newints - last - 1);
                }

                m_length = value;
                _version++;
            }
        }

        // ICollection implementation
        public void CopyTo(Array array, int index)
        {
            if (array == null)
                throw new ArgumentNullException("array");

            if (index < 0)
                throw new ArgumentOutOfRangeException("index");

            if (array.Rank != 1)
                throw new ArgumentException();


            if (array is int[])
            {
                Array.Copy(m_array, 0, array, index, (m_length + 31) / 32);
            }
            else if (array is byte[])
            {
                if ((array.Length - index) < (m_length + 7) / 8)
                    throw new ArgumentException();

                byte[] b = (byte[])array;
                for (int i = 0; i < (m_length + 7) / 8; i++)
                    b[index + i] = (byte)((m_array[i / 4] >> ((i % 4) * 8)) & 
0x000000FF); // Shift to bring the required byte to LSB, then mask
            }
            else if (array is bool[])
            {
                if (array.Length - index < m_length)
                    throw new ArgumentException();

                bool[] b = (bool[])array;
                for (int i = 0; i < m_length; i++)
                    b[index + i] = ((m_array[i / 32] >> (i % 32)) & 0x00000001) 
!= 0;
            }
            else
                throw new ArgumentException();
        }

        public int Count
        {
            get
            {
                return m_length;
            }
        }

        public Object Clone()
        {
            BitArrayEx BitArrayEx = new BitArrayEx(m_array);
            BitArrayEx._version = _version;
            BitArrayEx.m_length = m_length;
            return BitArrayEx;
        }

        public Object SyncRoot
        {
            get
            {
                if (_syncRoot == null)
                {
                    System.Threading.Interlocked.CompareExchange(ref _syncRoot, 
new Object(), null);
                }
                return _syncRoot;
            }
        }

        public bool IsReadOnly
        {
            get
            {
                return false;
            }
        }


        public bool IsSynchronized
        {
            get
            {
                return false;
            }
        }

        public IEnumerator GetEnumerator()
        {
            return new BitArrayEnumeratorSimple(this);
        }

        public unsafe int GetNextSetBit(int indexStart)
        {
            int byteIndex = indexStart >> 5;
            int bitStart = 0;
            int retVal = -1;

            fixed (int* ints = m_array)
            {
                for(int i = byteIndex; i<m_array.Length; i++)
                {
                    if (ints[i] != 0)
                    {
                        if (i == byteIndex)
                        {
                            bitStart = indexStart % 32;
                        }
                        retVal = i * 32 + bitposlong(ints[i], bitStart);
                        return retVal;
                      
                    }
                }
            }
            return retVal;
        }

        int bitposlong(int val, int bitStart)
        {
            val = val >> bitStart;

            int b = 0;
            int tmp;

            if (val == 0)
                return 32;
            if ((tmp = (val & 0x0000ffff)) == 0)
                b = 16; /* b += 16 is equivelent */
            else
                val = tmp;

            if ((tmp = (val & 0x00ff00ff)) == 0)
                b += 8;
            else
                val = tmp;

            if ((tmp = (val & 0x0f0f0f0f)) == 0)
                b += 4;
            else
                val = tmp;

            if ((tmp = (val & 0x33333333)) == 0)
                b += 2;
            else
                val = tmp;

            if ((val & 0x55555555) == 0)
                b++;
            return b + bitStart;
        }

        public unsafe int GetSetBitCount()
        {
            if (bits_in_16bits == null) compute_bits_in_16bits();

            int bitCount = 0;
            fixed (int * arr = m_array){
                for (int i = 0; i < m_array.Length; i++)
                {
                    bitCount += bits_in_16bits[arr[i] & 0xffffu]
                                + bits_in_16bits[(arr[i] >> 16) & 0xffffu];     
      
                }
            }
            return bitCount;
        }

        uint iterated_bitcount(uint n)
        {
            uint count=0;    
            while (n > 0)
            {
                count += n & 0x1u ;    
                n >>= 1 ;
            }
            return count ;
        }

        static int[] bits_in_16bits ;

        void compute_bits_in_16bits ()
        {
            bits_in_16bits = new int[0x1u << 16];
            uint i ;    
            for (i = 0; i < (0x1u<<16); i++)
                bits_in_16bits [i] = (int)iterated_bitcount (i) ;
            return ;
        }

        int precomputed16_bitcount (uint n)
        {
            // works only for 32-bit int
            
            return bits_in_16bits [n         & 0xffffu]
                +  bits_in_16bits [(n >> 16) & 0xffffu] ;
        }


        [Serializable()]
        private class BitArrayEnumeratorSimple : IEnumerator, ICloneable
        {
            private BitArrayEx BitArrayEx;
            private int index;
            private int version;
            private bool currentElement;

            internal BitArrayEnumeratorSimple(BitArrayEx BitArrayEx)
            {
                this.BitArrayEx = BitArrayEx;
                this.index = -1;
                version = BitArrayEx._version;
            }

            public Object Clone()
            {
                return MemberwiseClone();
            }

            public virtual bool MoveNext()
            {
                if (version != BitArrayEx._version) throw new 
InvalidOperationException();
                if (index < (BitArrayEx.Count - 1))
                {
                    index++;
                    currentElement = BitArrayEx.Get(index);
                    return true;
                }
                else
                    index = BitArrayEx.Count;

                return false;
            }

            public virtual Object Current
            {
                get
                {
                    if (index == -1)
                        throw new InvalidOperationException();
                    if (index >= BitArrayEx.Count)
                        throw new InvalidOperationException();
                    return currentElement;
                }
            }

            public void Reset()
            {
                if (version != BitArrayEx._version) throw new 
InvalidOperationException();
                index = -1;
            }
        }

        private int* unsafe_array;

        private int[] m_array;

        public int[] IntArray
        {
            get { return m_array; }
            set { m_array = value; }
        }

        private int m_length;
        private int _version;
        [NonSerialized]
        private Object _syncRoot;

        private const int _ShrinkThreshold = 256000;
    }

}

Reply via email to