Should we start a thread with potential users, eg Phoenix community? On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <[email protected]> wrote: > I’d also like to hear from potential users of this feature. They could try > this functionality as it becomes available, and help us prioritize features. > > Julian > > >> On Jun 28, 2017, at 9:10 AM, Atri Sharma <[email protected]> wrote: >> >> And I have created the JIRA: >> >> https://issues.apache.org/jira/browse/CALCITE-1861 >> >> >> >> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <[email protected]> wrote: >>> Is anyone looking for a neat project in Calcite that would have a big >>> impact? I'm thinking that we could add support for spatial indexes to >>> Calcite in such a way that downstream projects such as Phoenix and >>> Flink could easily benefit from it. >>> >>> GIS (Geographic Information Systems, aka Spatial database) is really >>> useful functionality to have in your database. To find restaurants >>> less than 1 km from downtown San Francisco, you could run >>> >>> select * >>> from restaurants as r >>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1; >>> >>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial >>> and Microsoft SQL Server; and OpenGIS has standardized SQL >>> extensions[1]. >>> >>> Now, the SQL-GIS standard is rather large, and involves implementing >>> lots of data types and scalar functions. We could get to that >>> eventually. But I contend that many, many applications would be >>> satisfied by points and distances (like the query above) and a spatial >>> index to make them run quickly. And I believe that we can add spatial >>> index support to Calcite using a logical rewrite rule. >>> >>> Rewriting spatial queries to indexes on space-filling curves is a >>> well-established technique [2]. >>> >>> Suppose that the restaurants table, above, had columns latitude and >>> longitude and a computed numeric column h = hilbert(latitude, >>> longitude). Hilbert curves are space-filling curves such that if two >>> points are close in space then their h values will be close. So, if >>> there is an index on h, we can find all restaurants close to a given >>> point using a range scan of the index. >>> >>> So, the above query could be rewritten to something like >>> >>> select * >>> from restaurants as r >>> where (r.h between 123456 and 123599 >>> or r.h between 256789 and 259887) >>> and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1; >>> >>> The range predicates on r.h quickly eliminate 99.9% of the rows in the >>> database, and the call to st_distance_internal eliminates the >>> remaining false positives. >>> >>> That rewrite can be done using a logical rewrite rule, and the >>> resulting query will be faster on just about any database, but >>> especially one with key-sorted tables (like Phoenix/HBase) or >>> range-partitioned tables. The database does not need to have a >>> dedicated "spatial index" data structure. >>> >>> Julian >>> >>> [1] http://www.opengeospatial.org/standards/sfs >>> >>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf >> >> >> >> -- >> Regards, >> >> Atri >> l'apprenant >
-- Regards, Atri l'apprenant
