I don't know how much I can contribute but it sounds like a great way to expand Calcite. Good idea Julian!
On Jun 27, 2017 7:06 PM, "Atri Sharma" <[email protected]> wrote: > I am all for it. This sounds really interesting. > > On Jun 28, 2017 7:02 AM, "Julian Hyde" <[email protected]> wrote: > > > Is anyone looking for a neat project in Calcite that would have a big > > impact? I'm thinking that we could add support for spatial indexes to > > Calcite in such a way that downstream projects such as Phoenix and > > Flink could easily benefit from it. > > > > GIS (Geographic Information Systems, aka Spatial database) is really > > useful functionality to have in your database. To find restaurants > > less than 1 km from downtown San Francisco, you could run > > > > select * > > from restaurants as r > > where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1; > > > > There are mature SQL implementations of GIS in PostGIS, Oracle Spatial > > and Microsoft SQL Server; and OpenGIS has standardized SQL > > extensions[1]. > > > > Now, the SQL-GIS standard is rather large, and involves implementing > > lots of data types and scalar functions. We could get to that > > eventually. But I contend that many, many applications would be > > satisfied by points and distances (like the query above) and a spatial > > index to make them run quickly. And I believe that we can add spatial > > index support to Calcite using a logical rewrite rule. > > > > Rewriting spatial queries to indexes on space-filling curves is a > > well-established technique [2]. > > > > Suppose that the restaurants table, above, had columns latitude and > > longitude and a computed numeric column h = hilbert(latitude, > > longitude). Hilbert curves are space-filling curves such that if two > > points are close in space then their h values will be close. So, if > > there is an index on h, we can find all restaurants close to a given > > point using a range scan of the index. > > > > So, the above query could be rewritten to something like > > > > select * > > from restaurants as r > > where (r.h between 123456 and 123599 > > or r.h between 256789 and 259887) > > and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) > <= > > 1; > > > > The range predicates on r.h quickly eliminate 99.9% of the rows in the > > database, and the call to st_distance_internal eliminates the > > remaining false positives. > > > > That rewrite can be done using a logical rewrite rule, and the > > resulting query will be faster on just about any database, but > > especially one with key-sorted tables (like Phoenix/HBase) or > > range-partitioned tables. The database does not need to have a > > dedicated "spatial index" data structure. > > > > Julian > > > > [1] http://www.opengeospatial.org/standards/sfs > > > > [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf > > >
