Re: [spatial] Rethinking Cartesian Tiers implementation

Grant Ingersoll Wed, 30 Jun 2010 06:38:34 -0700

Taking a step back, I do think it is worthwhile to document what I believe is 
the intent of the Tiers/Tiles/Grid and how they work.


Here, in a nutshell, is my view of what Tiers does (note, this is separate from 
what the code does, as it tries to make some optimizations/lazy decisions):

1. Take a sphere/ellipsoid and project it into a 2D space
2. For a given zoom level (call it n)
        a.  Slice the 2D space up into 2^n boxes
        b. Label each of those boxes with a unique id (this is done in a lazy 
way, as we don't literally label all the boxes ahead of time)

During Indexing:
1. For each desired zoom level
        a. Given a latitude and longitude and a zoom level, add a term to a 
field for that zoom level that contains the id of the box that the lat/lon is 
in.  This is, essentially, a feature reduction step, especially in highly dense 
areas as it is now possible to represent a whole bunch of things with a single 
term

During search:
1. Given an input lat/lon and a distance
        a. Determine the "best fit" zoom level that will allow us to exclude 
the as many documents as possible using the fewest terms possible
        b. For that zoom level, get the ids of the boxes that cover the 
distance given
        c. Perform the search/filtering/sorting using the box ids

Now, obviously, the devil is in the details of all this.  One needs to know how 
to consistently create the box ids for starters and also determine the best fit 
(we have two different approaches for that, one committed and one in JIRA).  
One also has to deal with pesky things like the prime/180 meridian and the 
north and south pole in order to get this all right.

This approach has, I believe, some significant advantages, especially if one 
can do a good job determining the best fit, as it can significantly reduce the 
number of terms that need to be iterated and it also collapses lat/lon into a 
single field which also helps improve search times.  The downside is it often 
takes up more index space due to the many zoom levels.

Does this match up with other people's understanding?

-Grant

On Jun 29, 2010, at 11:02 PM, Grant Ingersoll wrote:

> After having spent a lot of time looking at the Tier stuff and debugging it 
> and discussing it on IRC, I am of the opinion (and, I still have some belief 
> that it's just me who isn't getting it, but others have confirmed my 
> suspicion) that the spatial Cartesian Tier implementation is broken and, due 
> to the lack of documentation and resources, it is not clear how to fix it; 
> furthermore we don't seem to have anyone with the bandwidth or interest to 
> fix it (I've tried, but I've about reached my limit on it in terms of 
> willingness to work on it).  Even if it is not broken and it is just me, the 
> fact that I don't get it after spending as much time on it that I have is not 
> a good sign for maintainability.  
> 
> AFAICT, a lot of it stems from the fact that SinusoidalProjection is broken 
> (https://issues.apache.org/jira/browse/LUCENE-2519) and then layers upon 
> layers of other, minor issues were built on top of that such that at the end 
> of those layers are unit tests that are nearly impossible to untangle and in 
> between is a fairly large mess.   Well, I suppose I should moderate those 
> statements: I think all of it probably works as long as you stay completely 
> in a given quadrant of the Earth for all your points (and likely really, just 
> the US/ northern/western hemispheres), but even then it only works b/c it 
> consistently calculates the wrong values for all points.
> 
> At the same time, I think the theory behind it (i.e. a labeled, nested grid 
> system) is useful for many people solving spatial search problems and can 
> offer significant query time savings.  So, I think there are a couple of 
> things we could do:
> 
> 1. Remove it and start from scratch and take a more standard approach
> 2. Find someone to fix the existing stuff who groks it
> 3. Document that it is only really useful for a given quadrant and then 
> either do 1 or 2 over the long term.  We still need to fix somethings, IMO, 
> to feel comfortable w/ it.
> 4. At a minimum, I think we should mark it as deprecated until it can be 
> better dealt with later
> 
> For now, I'm going to remove it from Solr and just focus on finishing up the 
> filtering stuff for the other point types.
> 
> Thoughts?  Volunteers?
> 
> -Grant
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [spatial] Rethinking Cartesian Tiers implementation

Reply via email to