Re: [SPATIAL] Best Fit Calculation

Grant Ingersoll Wed, 14 Apr 2010 10:17:49 -0700

Thanks.  I added my comment on the issue.  I think we should revert and then 
someone can put up a patch to make this pluggable.  As it stands, this Best Fit 
calculation has nothing to do with the CartesianTierPlotter anyway, so we could 
refactor it pretty easily.


-Grant

On Apr 14, 2010, at 12:42 PM, Helleringer, Nicolas wrote:

> Tables are well on JIRA : https://issues.apache.org/jira/browse/LUCENE-2359
> 
> Nicolas
> 
> 2010/4/14 Helleringer, Nicolas <[email protected]>
> Here are the summary tables :
> 
> First a table to remind metrics on the Tiers :
> Tile Level TierLegnth TierBoxes TileLength (miles) 0 1 1 24902 1 2 4 12451 2 
> 4 16 6225,5 3 8 64 3112,75 4 16 256 1556,375 5 32 1024 778,1875 6 64 4096 
> 389,09375 7 128 16384 194,546875 8 256 65536 97,2734375 9 512 262144 
> 48,63671875 10 1024 1048576 24,31835938 11 2048 4194304 12,15917969 12 4096 
> 16777216 6,079589844 13 8192 67108864 3,039794922 14 16384 268435456 
> 1,519897461 15 32768 1073741824 0,75994873
> 
> 
> Then the comparaison table between legacy and new bestFit :
> Radius (miles) legacy bestFit legacy bestFit TileLength legacy bestFit max 
> number of Box to fetch new bestFit new bestFit TileLength new bestFit number 
> of Box to fetch 1 18 0,75994873 9 14 1,519897461 4 5 16 0,75994873 64 12 
> 6,079589844 4 10 15 0,75994873 225 11 12,15917969 4 25 13 3,039794922 100 9 
> 24,31835938 9 50 12 6,079589844 100 8 97,2734375 4 100 11 12,15917969 100 7 
> 194,546875 4 250 10 24,31835938 144 6 389,09375 4 500 9 48,63671875 144 5 
> 778,1875 4 1000 8 97,2734375 144 4 1556,375 4 2500 7 194,546875 196 3 3112,75 
> 4 5000 6 389,09375 196 2 6225,5 4 10000 5 778,1875 196 1 12451 4
> 
> I hope mailers will keep the formating ...
> 
> 
> 
> 
> 
> If not I shall post on JIRA.
> 
> Formulas :
> TileLength is 24902 (earth circumference) / TierLength
> bestFit formulas as summarized by Grant in his email.
> number of box to fetch : pow(ceil(TileLength/Radius)+1,2) => 
> TileLength/Radius is for how many tiles are needed to cover the radius, +1 is 
> because you are not always well aligned, the pow(X,2) because there is two 
> directions/axis
> 
> Best regards,
> 
> Nicolas
> 
> 2010/4/14 Chris Male <[email protected]>
> 
> Hi,
> 
> On Wed, Apr 14, 2010 at 6:07 PM, Grant Ingersoll <[email protected]> wrote:
> 
> On Apr 14, 2010, at 11:06 AM, Chris Male wrote:
> 
> > Hi,
> >
> > My understanding of the benefits of the new algorithm is that it means a 
> > lower tier level resulting in fewer boxes, but more documents inside those 
> > boxes that are outside of the search radius.
> >
> > While having fewer boxes means fewer term queries to make against the 
> > index, more documents means more costly calculations to filter out those 
> > extraneous documents.
> >
> > For those doing just Cartesian Tier filtering it seems like the new 
> > approach is a win, but for those doing distance calculations on those 
> > documents passing the filter, it seems to come at a cost.
> 
> Currently, this is only used for filtering.  AIUI, Tiers aren't really that 
> useful for distance calculations, are they?  After all, all you have is a box 
> id and you'd have to reverse out the calc of that to be able to calc a 
> distance, no?  Perhaps I'm missing something.
> 
> 
> How Spatial Lucene currently works (or at least one of the ways it was 
> designed to work), is using a 2 step filtering process.  Step 1 is the 
> Cartesian Tier filtering.  The resulting set of Documents is then passed on 
> through to Step 2 which then calculates the distance from each Document to 
> the search centre.  If the distance is greater than the radius, the Document 
> is filtered out.  This means that after both filtering steps you have only 
> those Documents that are in the search radius.
> 
> How this impacts this algorithm choice is that the more Documents the pass 
> through Step 1, the more calculations that have to be done in Step 2.
>  
> I'm not sure, however, that it is a win for filtering.  It seems like you end 
> up including docs in the result set that should be in there.
> 
> I'll wait for Nicolas' summary table, but I'm inclined to revert and then 
> someone can refactor if they want to offer alternate implementations.
> 
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 
> 
> -- 
> Chris Male | Software Developer | JTeam BV.| www.jteam.nl
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: [SPATIAL] Best Fit Calculation

Reply via email to