[jira] [Comment Edited] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

John Berryman (JIRA) Thu, 09 May 2013 19:39:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653095#comment-13653095
 ]


John Berryman edited comment on LUCENE-4922 at 5/10/13 2:38 AM:
----------------------------------------------------------------

Hmmm... integer representation huh. Well here's a thought then:

As a first got at this idea, let's define something like a geohash where x are 
interleaved, but here's how we do it. At the top level, number squares from 0 
to 3.

{noformat}
   0 --- 1
   |     |
   2 --- 3
{noformat}

At the next level, number things similarly, 

{noformat}
    00 -- 01 -- 10 -- 11
    |     |     |     |
    02 -- 03 -- 12 -- 13
    |     |     |     |
    20 -- 21 -- 30 -- 31
    |     |     |     |
    22 -- 23 -- 32 -- 33
{noformat}

Even though this *looks* like the hilbert thing I did above, notice that this 
is actually the [Z-ordering|http://en.wikipedia.org/wiki/Z-order_curve] which 
is a little easier to compute.

In this case, the first two bits encodes which of the four big boxes the point 
is in, the next two bits encodes which of the four sub boxes the point is in, 
etc. So for example [0.375, 0.625] would be encoded to a depth of 2 by "03" 
which can be stored in half a byte.

Got it? So... now since we have the original point encoded in z-ordering, we 
can create a new hilbert_point algorithm that takes a byte array representing 
the z-ordering encoding of a point rather than a 2-vector of doubles. And the 
code looks much the same except that instead of the "val[0]*2" etc. we're 
actually just iterating through the byte array 2 bits at a time which is 
effectively the same as multiplying by 2.

This would make for some exquisitely indecipherable byte-munging code. But 
would it ultimately be more efficient? It largely depends upon how complex the 
Z-ordering encoding is. What do you think?
                
      was (Author: berryman):
    Hmmm... integer representation huh. Well here's a thought then:

As a first got at this idea, let's define something like a geohash where x are 
interleaved, but here's how we do it. At the top level, number squares from 0 
to 3.

{noformat}
   0 --- 1
   |     |
   2 --- 3
{noformat}

At the next level, number things similarly, 

{noformat}
    00 -- 01 -- 10 -- 11
    |     |     |     |
    02 -- 03 -- 12 -- 13
    |     |     |     |
    20 -- 21 -- 30 -- 31
    |     |     |     |
    22 -- 23 -- 32 -- 33
{noformat}

Even though this *looks* like the hilbert thing I did above, notice that this 
is actually the [Z-ordering|http://en.wikipedia.org/wiki/Z-order_curve] which 
is a little easier to compute.

In this case, the first two bits encodes which of the four big boxes the point 
is in, the next two bits encodes which of the four sub boxes the point is in, 
etc. So for example [0.375, 0.625] would be encoded to a depth of 2 by "03" 
which can be stored in half a byte.

Got it? So... now since we have the original point encoded in z-ordering. We 
can create a new hilbert_point algorithm that takes a byte array representing 
the z-ordering encoding of a point rather than a 2-vector of doubles. And the 
code looks much the same except that instead of the "val[0]/2" etc. we're 
actually just iterating through the byte array 2 bits at a time (with no 
backtracking or lookahead).

This would make for some exquisitely indecipherable byte-munging code. But 
ultimately it might not help that much - it largely depends upon how complex 
the Z-ordering encoding is. What do you think?
                  
> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-4922
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4922
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>              Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

Reply via email to