I am all for it. This sounds really interesting.

On Jun 28, 2017 7:02 AM, "Julian Hyde" <[email protected]> wrote:

> Is anyone looking for a neat project in Calcite that would have a big
> impact? I'm thinking that we could add support for spatial indexes to
> Calcite in such a way that downstream projects such as Phoenix and
> Flink could easily benefit from it.
>
> GIS (Geographic Information Systems, aka Spatial database) is really
> useful functionality to have in your database. To find restaurants
> less than 1 km from downtown San Francisco, you could run
>
>   select *
>   from restaurants as r
>   where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>
> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
> and Microsoft SQL Server; and OpenGIS has standardized SQL
> extensions[1].
>
> Now, the SQL-GIS standard is rather large, and involves implementing
> lots of data types and scalar functions. We could get to that
> eventually. But I contend that many, many applications would be
> satisfied by points and distances (like the query above) and a spatial
> index to make them run quickly. And I believe that we can add spatial
> index support to Calcite using a logical rewrite rule.
>
> Rewriting spatial queries to indexes on space-filling curves is a
> well-established technique [2].
>
> Suppose that the restaurants table, above, had columns latitude and
> longitude and a computed numeric column h = hilbert(latitude,
> longitude). Hilbert curves are space-filling curves such that if two
> points are close in space then their h values will be close. So, if
> there is an index on h, we can find all restaurants close to a given
> point using a range scan of the index.
>
> So, the above query could be rewritten to something like
>
>   select *
>   from restaurants as r
>   where (r.h between 123456 and 123599
>       or r.h between 256789 and 259887)
>     and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <=
> 1;
>
> The range predicates on r.h quickly eliminate 99.9% of the rows in the
> database, and the call to st_distance_internal eliminates the
> remaining false positives.
>
> That rewrite can be done using a logical rewrite rule, and the
> resulting query will be faster on just about any database, but
> especially one with key-sorted tables (like Phoenix/HBase) or
> range-partitioned tables. The database does not need to have a
> dedicated "spatial index" data structure.
>
> Julian
>
> [1] http://www.opengeospatial.org/standards/sfs
>
> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>

Reply via email to