Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

eks dev Thu, 07 Sep 2006 02:35:54 -0700

As suspected in my first email on this topic, test is wrong, garbage collection 
was kicking in at wrong times so the results were convoluted. Ah, old good 
predictable c times are long gone.

Anyhow, appologies for this noise (particulary to Yonik). The real speed 
relationships are attached now, everything looks as it should be, 
BitSetIterator is slower than OpenBitSet in skipTo() scenario...

----- Original Message ----
From: eks dev <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, 7 September, 2006 10:03:45 AM
Subject: Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

The same effect showed-up on athlon CPU... The only diference between your 
PerfTest and what I have been doing is that your loop is as tight as it goes. 
*Maybe* worth of investigating a bit futher as my tests look a bit closer to 
the real usage scenarios.

I will try to identify cases where OpenBitSet nextSetBit starts being slower 
... Loks a lot like what you said, ntz() vs Long.numberOfTrailingZeros(). The 
second one is (I guess) native call, so optimizations from ntz() go in and out 
of some cache (too deep for me anyhow, just shooting in the dark). 

Good news, inspite of your doubts, BitSetIterator rocks in skipTo scenario! 
VInt as well, even in  medium density cases (memory profitability border is now 
the only condidtion for its usage).

on the other note,
the key for really efficiant matchers will be good SmartMatcherFactory that 
picks the best representation for given density/"sortednes". 
The cases I've been able to identify so far:

Very Low density - IntList
Low density VIntSortedList
Dense - OpenBitSet/BitSetIterator or such
Sorted - (imagine case where you have an oportunity to sort your index on 
category field, quite offten I guess as it does not require absolute 
"sortedness", it is enough to sort periodicly without caring for smaller 
updates). There, one simple interval list can do the magic in just a few bytes 
of memory, even in high density cases. 

Finding good strategy to find optimal Matcher representation is not all that 
hard, but really deciding when to do it will be a bit trickier (in warm-up case 
is easy, but in dynamic LRU case it gets complicated as you need as the first 
efficiently to collect bits in some hits collector, and than transform it to 
the optimal representation (SmartBitSet), quickly as otherwise the benefit of 
LRU caching will quickly get lost on time spant updateing cache...)

More ideas on this?

----- Original Message ----
From: Yonik Seeley <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Wednesday, 6 September, 2006 11:54:14 PM
Subject: Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

On 9/6/06, eks dev <[EMAIL PROTECTED]> wrote:
> still, DenseBitsMatcher (BitSetIterator warpped in Matcher) works faster than 
> anything else for this case:
>
>  int skip(Matcher m) throws IOException{
>      int doc=-1, ret = 0;
>      while(m.skipTo(doc+1)){
>         doc = m.doc();
>         ret+=doc;
>       }
>      return ret;
>   }

Huh... that doesn't make much sense.
Maybe it's your pentium-M with it's shorter pipeline and less penalty
for a pipeline flush on a branch prediction miss.  You could try
substituting BitUtil.ntz2 for ntz in OpenBitSet.nextSetBit... it uses
a full binary search down to the byte level, and then an array lookup.

The nice thing about being *open* is that you can write more than one
type of iterator if you really want to:-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

Reply via email to