Here are the summary tables :

First a table to remind metrics on the Tiers :
Tile Level TierLegnth TierBoxes TileLength (miles) 0 1 1 24902 1 2 4 12451 2
4 16 6225,5 3 8 64 3112,75 4 16 256 1556,375 5 32 1024 778,1875 6 64 4096
389,09375 7 128 16384 194,546875 8 256 65536 97,2734375 9 512 262144
48,63671875 10 1024 1048576 24,31835938 11 2048 4194304 12,15917969 12 4096
16777216 6,079589844 13 8192 67108864 3,039794922 14 16384 268435456
1,519897461 15 32768 1073741824 0,75994873


Then the comparaison table between legacy and new bestFit :
Radius (miles) legacy bestFit legacy bestFit TileLength legacy bestFit max
number of Box to fetch new bestFit new bestFit TileLength new bestFit number
of Box to fetch 1 18 0,75994873 9 14 1,519897461 4 5 16 0,75994873 64 12
6,079589844 4 10 15 0,75994873 225 11 12,15917969 4 25 13 3,039794922 100 9
24,31835938 9 50 12 6,079589844 100 8 97,2734375 4 100 11 12,15917969 100 7
194,546875 4 250 10 24,31835938 144 6 389,09375 4 500 9 48,63671875 144 5
778,1875 4 1000 8 97,2734375 144 4 1556,375 4 2500 7 194,546875 196 3
3112,75 4 5000 6 389,09375 196 2 6225,5 4 10000 5 778,1875 196 1 12451 4

I hope mailers will keep the formating ...





If not I shall post on JIRA.

Formulas :
TileLength is 24902 (earth circumference) / TierLength
bestFit formulas as summarized by Grant in his email.
number of box to fetch : pow(ceil(TileLength/Radius)+1,2) =>
TileLength/Radius is for how many tiles are needed to cover the radius, +1
is because you are not always well aligned, the pow(X,2) because there is
two directions/axis

Best regards,

Nicolas

2010/4/14 Chris Male <gento...@gmail.com>

> Hi,
>
> On Wed, Apr 14, 2010 at 6:07 PM, Grant Ingersoll <gsing...@apache.org>wrote:
>
>>
>> On Apr 14, 2010, at 11:06 AM, Chris Male wrote:
>>
>> > Hi,
>> >
>> > My understanding of the benefits of the new algorithm is that it means a
>> lower tier level resulting in fewer boxes, but more documents inside those
>> boxes that are outside of the search radius.
>> >
>> > While having fewer boxes means fewer term queries to make against the
>> index, more documents means more costly calculations to filter out those
>> extraneous documents.
>> >
>> > For those doing just Cartesian Tier filtering it seems like the new
>> approach is a win, but for those doing distance calculations on those
>> documents passing the filter, it seems to come at a cost.
>>
>> Currently, this is only used for filtering.  AIUI, Tiers aren't really
>> that useful for distance calculations, are they?  After all, all you have is
>> a box id and you'd have to reverse out the calc of that to be able to calc a
>> distance, no?  Perhaps I'm missing something.
>>
>>
> How Spatial Lucene currently works (or at least one of the ways it was
> designed to work), is using a 2 step filtering process.  Step 1 is the
> Cartesian Tier filtering.  The resulting set of Documents is then passed on
> through to Step 2 which then calculates the distance from each Document to
> the search centre.  If the distance is greater than the radius, the Document
> is filtered out.  This means that after both filtering steps you have only
> those Documents that are in the search radius.
>
> How this impacts this algorithm choice is that the more Documents the pass
> through Step 1, the more calculations that have to be done in Step 2.
>
>
>> I'm not sure, however, that it is a win for filtering.  It seems like you
>> end up including docs in the result set that should be in there.
>>
>> I'll wait for Nicolas' summary table, but I'm inclined to revert and then
>> someone can refactor if they want to offer alternate implementations.
>>
>> -Grant
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>
>
> --
> Chris Male | Software Developer | JTeam BV.| www.jteam.nl
>

Reply via email to