+1 for the feature, sounds interesting, thanks Julian. As a flink user, watch to see more how this would be applied to flink.
On Thu, Jun 29, 2017 at 3:22 AM, Julian Hyde <[email protected]> wrote: > PS The author of the paper referenced in that StackOverflow article[4], > John Skilling, was my applied mathematics supervisor in Cambridge. It’s a > small world. > > [4] http://ratml.org/misc/hilbert_curve.pdf <http://ratml.org/misc/ > hilbert_curve.pdf> > > > > On Jun 28, 2017, at 11:42 AM, Atri Sharma <[email protected]> wrote: > > > > How do you propose we start here? I can start writing a spec based on > > the implementation, based on how Postgres and others implement them. > > > > On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <[email protected]> wrote: > >> No, anyone brave enough to be an early adopter would already be on this > list. > >> > >> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill) > can be proxies for their their communities. And others who are building > interesting stuff I don’t know about. > >> > >> Julian > >> > >>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <[email protected]> wrote: > >>> > >>> Should we start a thread with potential users, eg Phoenix community? > >>> > >>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <[email protected]> > wrote: > >>>> I’d also like to hear from potential users of this feature. They > could try this functionality as it becomes available, and help us > prioritize features. > >>>> > >>>> Julian > >>>> > >>>> > >>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <[email protected]> > wrote: > >>>>> > >>>>> And I have created the JIRA: > >>>>> > >>>>> https://issues.apache.org/jira/browse/CALCITE-1861 > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <[email protected]> > wrote: > >>>>>> Is anyone looking for a neat project in Calcite that would have a > big > >>>>>> impact? I'm thinking that we could add support for spatial indexes > to > >>>>>> Calcite in such a way that downstream projects such as Phoenix and > >>>>>> Flink could easily benefit from it. > >>>>>> > >>>>>> GIS (Geographic Information Systems, aka Spatial database) is really > >>>>>> useful functionality to have in your database. To find restaurants > >>>>>> less than 1 km from downtown San Francisco, you could run > >>>>>> > >>>>>> select * > >>>>>> from restaurants as r > >>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1; > >>>>>> > >>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle > Spatial > >>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL > >>>>>> extensions[1]. > >>>>>> > >>>>>> Now, the SQL-GIS standard is rather large, and involves implementing > >>>>>> lots of data types and scalar functions. We could get to that > >>>>>> eventually. But I contend that many, many applications would be > >>>>>> satisfied by points and distances (like the query above) and a > spatial > >>>>>> index to make them run quickly. And I believe that we can add > spatial > >>>>>> index support to Calcite using a logical rewrite rule. > >>>>>> > >>>>>> Rewriting spatial queries to indexes on space-filling curves is a > >>>>>> well-established technique [2]. > >>>>>> > >>>>>> Suppose that the restaurants table, above, had columns latitude and > >>>>>> longitude and a computed numeric column h = hilbert(latitude, > >>>>>> longitude). Hilbert curves are space-filling curves such that if two > >>>>>> points are close in space then their h values will be close. So, if > >>>>>> there is an index on h, we can find all restaurants close to a given > >>>>>> point using a range scan of the index. > >>>>>> > >>>>>> So, the above query could be rewritten to something like > >>>>>> > >>>>>> select * > >>>>>> from restaurants as r > >>>>>> where (r.h between 123456 and 123599 > >>>>>> or r.h between 256789 and 259887) > >>>>>> and st_distance_internal(point(-122.4194, 37.7749), > r.coordinates) <= 1; > >>>>>> > >>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in > the > >>>>>> database, and the call to st_distance_internal eliminates the > >>>>>> remaining false positives. > >>>>>> > >>>>>> That rewrite can be done using a logical rewrite rule, and the > >>>>>> resulting query will be faster on just about any database, but > >>>>>> especially one with key-sorted tables (like Phoenix/HBase) or > >>>>>> range-partitioned tables. The database does not need to have a > >>>>>> dedicated "spatial index" data structure. > >>>>>> > >>>>>> Julian > >>>>>> > >>>>>> [1] http://www.opengeospatial.org/standards/sfs > >>>>>> > >>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Regards, > >>>>> > >>>>> Atri > >>>>> l'apprenant > >>>> > >>> > >>> > >>> > >>> -- > >>> Regards, > >>> > >>> Atri > >>> l'apprenant > >> > > > > > > > > -- > > Regards, > > > > Atri > > l'apprenant > >
