Re: Spatial indexes

Atri Sharma Wed, 28 Jun 2017 10:29:47 -0700

Should we start a thread with potential users, eg Phoenix community?

On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <[email protected]> wrote:
> I’d also like to hear from potential users of this feature. They could try 
> this functionality as it becomes available, and help us prioritize features.
>
> Julian
>
>
>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <[email protected]> wrote:
>>
>> And I have created the JIRA:
>>
>> https://issues.apache.org/jira/browse/CALCITE-1861
>>
>>
>>
>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <[email protected]> wrote:
>>> Is anyone looking for a neat project in Calcite that would have a big
>>> impact? I'm thinking that we could add support for spatial indexes to
>>> Calcite in such a way that downstream projects such as Phoenix and
>>> Flink could easily benefit from it.
>>>
>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>> useful functionality to have in your database. To find restaurants
>>> less than 1 km from downtown San Francisco, you could run
>>>
>>>  select *
>>>  from restaurants as r
>>>  where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>
>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>> extensions[1].
>>>
>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>> lots of data types and scalar functions. We could get to that
>>> eventually. But I contend that many, many applications would be
>>> satisfied by points and distances (like the query above) and a spatial
>>> index to make them run quickly. And I believe that we can add spatial
>>> index support to Calcite using a logical rewrite rule.
>>>
>>> Rewriting spatial queries to indexes on space-filling curves is a
>>> well-established technique [2].
>>>
>>> Suppose that the restaurants table, above, had columns latitude and
>>> longitude and a computed numeric column h = hilbert(latitude,
>>> longitude). Hilbert curves are space-filling curves such that if two
>>> points are close in space then their h values will be close. So, if
>>> there is an index on h, we can find all restaurants close to a given
>>> point using a range scan of the index.
>>>
>>> So, the above query could be rewritten to something like
>>>
>>>  select *
>>>  from restaurants as r
>>>  where (r.h between 123456 and 123599
>>>      or r.h between 256789 and 259887)
>>>    and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>
>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>> database, and the call to st_distance_internal eliminates the
>>> remaining false positives.
>>>
>>> That rewrite can be done using a logical rewrite rule, and the
>>> resulting query will be faster on just about any database, but
>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>> range-partitioned tables. The database does not need to have a
>>> dedicated "spatial index" data structure.
>>>
>>> Julian
>>>
>>> [1] http://www.opengeospatial.org/standards/sfs
>>>
>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>
>>
>>
>> --
>> Regards,
>>
>> Atri
>> l'apprenant
>




-- 
Regards,

Atri
l'apprenant

Re: Spatial indexes

Reply via email to