Thanks to everyone on the replies. I'm going to try several of these approaches and with equivalent data sets and run some side-by-side tests.
No timeframes guarantees here, but I'll report back with the different approaches and the test results. cheers, -- j On 2/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > : Very good points, I hadn't considered the term frequency of the digits > : affecting scoring. As an aside, can that aspect of the score be ignored > for > : these fields? > > The easiest way is to use a boost that is so low it's insignificant, or > you could subclass TermQuery and override getSimilarity to return a > DelegateSimilarity which wraps the real instance and returns constant > values for things like tf() and idf() ... but i'm 95% sure that using a > RangeFilter (or a ConstantScoreRangeQuery) is going to be faster then all > of those TermQueries no matter what. > > : I need to spend more time with FunctionQuery, I haven't given it the > : attention it deserves. > > i would start by trying out an apples to apples comparison of your current > approach with one where your index only has one indexed field each for > long/lat that uses ConstantScoreRangeQuery to do the boxing. Compare both > the size of the resulting indexes, the memory footprint while open, and > the time spent executing comparable queries. You should probably compare > queries that involve both large boxes and small boxes, and depending on > hte usage pattern you expect consider caching your Filters if you expect > many boxes to be reused frequently. > > once you've found the "best" way to do your boxing ... then look into > using FunctionQueries to influence your scores based on distance fro mthe > center of hte box. > > : > : Great feedback, thanks for the notes. > : > : -- jeff > : > : On 2/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : > > : > > : > : Geo definition: > : > : Boxing around a center point. It's not critical to do a radius > search > : > with > : > : a given circle. A boxed approach allows for taller or wider frames > of > : > : reference, which are applicable for our use. > : > > : > if you are just loking to confine your results to a box then i think > : > RangeFiltering on both the X and Y axis will be more efficient then > the > : > individual term queries you are producing. > : > > : > It will have the added bonus of not artificially affecting the scores > of > : > hte documents based on how often a particular digit apears in a > particular > : > position of hte latitue accross your corpus. > : > > : > Once you've filtered down to a particular bounding box, you might > consider > : > going back to the function query approach to score documents inside > that > : > box based on their actual distance from the center point. I don't > recall > : > at the moment but i believe FunctionQuery's Scorer supports skipTo in > such > : > a way that it won't bother computing the function for a document that > has > : > been skiped (ie: when containing in a BooleanQuery with another clause > : > that has already prohibited it, or when executed in the context of a > : > Filter) > : > > : > > : > > : > -Hoss > : > > : > > : > --------------------------------------------------------------------- > : > To unsubscribe, e-mail: [EMAIL PROTECTED] > : > For additional commands, e-mail: [EMAIL PROTECTED] > : > > : > > : > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >