Re: possible TermInfosReader speedup

Mike Klaas Thu, 09 Apr 2009 11:52:14 -0700


On 8-Apr-09, at 11:13 PM, Michael Busch wrote:

I was thinking about doing this as part of LUCENE-1195. However, Idoubt that the net win will be very noticeable here. A commonscenario is that you have an index with one big body field that hasa lot of unique terms, plus several metafields that contribute lessunique terms. Even if all metafields together would contribute thesame amount of additional unique terms as the body field, thisproposed change would only save one term comparison per body termlookup. The reason is the O(log(n)) of the in-memory binary search.
The story is a bit different for looking up terms on the smallermetafields. Here you could probably save more term comparisons. ButI still think the improvement here would at the end be in the noise.I mean how long do e.g. 30 in-memory term comparisons take comparedto all the disk seeks, sequential I/Os, VInts decodings, etc. thatevery search needs to do? And you probably never have more than 2^30unique terms in your index.
So I doubt this improvement will be noticeable, but I would be happyif you proved me wrong and this was indeed a long hanging fruit.

I had a similar thought. It is hard to improve logarithmicalgorithms, since you need to reduce N exponentially to get a linearspeed up. OTOH, I have a few indices with several fields withmultiple-millions of terms each (I don't think that is uncommon. Evenindexing a docId per doc can cause this kind of situation). Also, inyour scenario, even if the main term search isn't much accelerated,the metadata field lookups might be much faster, which could add up tosome sort of win.


Like Michael, I'm doubtful, but see no reason not to try it!

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: possible TermInfosReader speedup

Reply via email to