I would like to start with a test schema (maybe a single table backed by a CSV 
file), a query that finds points within a given distance of a point, and do a 
rewrite of that query to use a range scan of a hilbert column.

Calcite would not have a “CREATE [SPATIAL] INDEX” command, but the rewrite rule 
would be sufficient for other systems to create one. (Similar to how Phoenix 
and Druid use rewrites to target their indexes.)

Let’s stick to flat-earth Euclidean geometry for now. I.e. points have 
unit-less coordinates, not latitude and longitude. The distance between (0, 0) 
and (4, 3) is exactly 5, per Pythagoras’ theorem.

As for standards, let's reference PostGIS, but also let’s use OpenGIS SQL 
(reference [1] in my original email).

You are going to have some fun with Hilbert curves. You’ll need not only “long 
hilbert(int, int)”, but also to compute the ranges on the hilbert curve that 
are within a circle or rectangle. Should we work on integers or floating point 
numbers? Hilbert functions on integers are nice, because it’s just 
bit-twiddling [3].

Julian

[3] 
https://stackoverflow.com/questions/499166/mapping-n-dimensional-value-to-a-point-on-hilbert-curve
 
<https://stackoverflow.com/questions/499166/mapping-n-dimensional-value-to-a-point-on-hilbert-curve>

> On Jun 28, 2017, at 11:42 AM, Atri Sharma <[email protected]> wrote:
> 
> How do you propose we start here? I can start writing a spec based on
> the implementation, based on how Postgres and others implement them.
> 
> On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <[email protected]> wrote:
>> No, anyone brave enough to be an early adopter would already be on this list.
>> 
>> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill) can 
>> be proxies for their their communities. And others who are building 
>> interesting stuff I don’t know about.
>> 
>> Julian
>> 
>>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <[email protected]> wrote:
>>> 
>>> Should we start a thread with potential users, eg Phoenix community?
>>> 
>>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <[email protected]> wrote:
>>>> I’d also like to hear from potential users of this feature. They could try 
>>>> this functionality as it becomes available, and help us prioritize 
>>>> features.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <[email protected]> wrote:
>>>>> 
>>>>> And I have created the JIRA:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/CALCITE-1861
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <[email protected]> wrote:
>>>>>> Is anyone looking for a neat project in Calcite that would have a big
>>>>>> impact? I'm thinking that we could add support for spatial indexes to
>>>>>> Calcite in such a way that downstream projects such as Phoenix and
>>>>>> Flink could easily benefit from it.
>>>>>> 
>>>>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>>>>> useful functionality to have in your database. To find restaurants
>>>>>> less than 1 km from downtown San Francisco, you could run
>>>>>> 
>>>>>> select *
>>>>>> from restaurants as r
>>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>> 
>>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>>>>> extensions[1].
>>>>>> 
>>>>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>>>>> lots of data types and scalar functions. We could get to that
>>>>>> eventually. But I contend that many, many applications would be
>>>>>> satisfied by points and distances (like the query above) and a spatial
>>>>>> index to make them run quickly. And I believe that we can add spatial
>>>>>> index support to Calcite using a logical rewrite rule.
>>>>>> 
>>>>>> Rewriting spatial queries to indexes on space-filling curves is a
>>>>>> well-established technique [2].
>>>>>> 
>>>>>> Suppose that the restaurants table, above, had columns latitude and
>>>>>> longitude and a computed numeric column h = hilbert(latitude,
>>>>>> longitude). Hilbert curves are space-filling curves such that if two
>>>>>> points are close in space then their h values will be close. So, if
>>>>>> there is an index on h, we can find all restaurants close to a given
>>>>>> point using a range scan of the index.
>>>>>> 
>>>>>> So, the above query could be rewritten to something like
>>>>>> 
>>>>>> select *
>>>>>> from restaurants as r
>>>>>> where (r.h between 123456 and 123599
>>>>>>    or r.h between 256789 and 259887)
>>>>>>  and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>> 
>>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>>>>> database, and the call to st_distance_internal eliminates the
>>>>>> remaining false positives.
>>>>>> 
>>>>>> That rewrite can be done using a logical rewrite rule, and the
>>>>>> resulting query will be faster on just about any database, but
>>>>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>>>>> range-partitioned tables. The database does not need to have a
>>>>>> dedicated "spatial index" data structure.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1] http://www.opengeospatial.org/standards/sfs
>>>>>> 
>>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> 
>>>>> Atri
>>>>> l'apprenant
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> Atri
>>> l'apprenant
>> 
> 
> 
> 
> -- 
> Regards,
> 
> Atri
> l'apprenant

Reply via email to