Yeah, good hint. We actually made such measurements on TreeIntegerSet implementation, and it is totally astonishing what you get as a result (I remember 6Meg against 2k Memory consumption for "predominantly sorted bit vectors" like zip codes, conjuction/disjunct speed oreder of magnitude faster as it walks shallow tree in that case). If you have any posibility to sort your indexes, do so, even Lucene on disk representation appreciates this I guess (skips are faster, bit vectors on disk better compressed/decompresed?) We even made one small visualizer of bit vectors that visualizes (generates image) HitCollector results for any specified query (gray image where every pixel represents 8-32 succesive bits from bit vector higher density=>darker color ). I like to see the enemy first. When we are allready in this area, just a curiosity, friend of mine has one head spinning idea, to utilize graphics card HW to do super fast bit vector operations. These thingies today are really optimized for basic bit operations. I am just curious to see what he comes up with. I hope I will have some time next week or so to polish some tests for OpenBitSet a bit and drop it somewhere on Jira if anybody has interest to play with.
A bit off topic, is there anybody who is doing ChainedFilter version that uses docNrSkipper? As I recall, you wrote BitSet version :) ----- Original Message ---- From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; eks dev <[EMAIL PROTECTED]> Sent: Tuesday, 16 May, 2006 8:13:53 PM Subject: Re: OpenBitSet : I measured also on different densities, and it looks about the same. : When I find a few spare minutes will make one PerfTest that generates : gnuplot diagrams. Wold be interesting to see how all key methods behave : as a function of density/size. I was thinking the same thing ... i just haven't had time to play with it. It migh also be usefull to check how the distribution of the set bits affects things -- i suspect that for some "Filters" there some amount of clustering as many people index their documents in a particular order, and then filter on ranges of that order (ie: index documents as they are created, and then filtering on create date) ... using Random.nextGaussian() to pick which bets to set might be interesting. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]