I ll try to find a little bit of time tonight to make a sample data set go through the two calculations to see the differences. I ll make a summary table.
I ll comment the issue with some comments on 'my' version of the alogrithm right now. Nicolas 2010/4/14 Chris Male <gento...@gmail.com> > Hi, > > My understanding of the benefits of the new algorithm is that it means a > lower tier level resulting in fewer boxes, but more documents inside those > boxes that are outside of the search radius. > > While having fewer boxes means fewer term queries to make against the > index, more documents means more costly calculations to filter out those > extraneous documents. > > For those doing just Cartesian Tier filtering it seems like the new > approach is a win, but for those doing distance calculations on those > documents passing the filter, it seems to come at a cost. > > Cheers > Chris > > > On Wed, Apr 14, 2010 at 5:00 PM, Grant Ingersoll <gsing...@apache.org>wrote: > >> LUCENE-2359 changed the best fit calculation. I admit, I'm not entirely >> certain which one is right, so I thought we should step back and talk about >> what we are trying to achieve. >> >> Please correct me if/where I am wrong. >> >> Looking at the problem of tiers/tiles/grids in general, we are taking a >> sphere, projecting it into a 2D plane. Next, we are dividing up the plane >> into nested grids/tiers. Each tier contains 2^tier id boxes. Thus, tier >> level 2 divides the earth up into 4 boxes. 2^15 = 32,768 boxes. We then, >> for each box, give it a unique label which then becomes the token that we >> index. During indexing, we typically will index many tiers, i.e. tiers 4 >> through 15. >> >> During search, we take in a lat/lon and a radius. The goal is to do a >> search using the fewest terms possible. Thus, we need to pick the tier that >> contains/covers the radius with the fewest number of boxes so that we can >> enumerate a very small number of documents. Thus, we need to calculate the >> best fit, which is a method inside of the CartesianTierPlotter. >> >> In the old way, we did: >> >> bf = min( 15, ceil(log2( earth_circumference / ( ( miles/2) - sqrt( >> (miles/2)^2 / 2 ) ) ) + 1 ) // we won't go higher than 15 for accuracy >> reasons >> >> The new way is: >> >> bf' = ceil ( log2( earth_circumference / ( 2 * miles ) ) ) >> >> These are obviously two different calculations, never mind the min(15) >> issue, we can easily resolve that one. >> >> AFAICT, the new way is much less accurate, but will likely be faster. >> >> So, which is right? >> >> Unfortunately, I find almost zero documentation on this, probably b/c the >> nomenclature is off, but... >> >> -Grant >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > > -- > Chris Male | Software Developer | JTeam BV.| www.jteam.nl >