Re: Spatial indexes

Liangfei Su Wed, 28 Jun 2017 17:11:34 -0700

+1 for the feature, sounds interesting, thanks Julian. As a flink user,
watch to see more how this would be applied to flink.


On Thu, Jun 29, 2017 at 3:22 AM, Julian Hyde <[email protected]> wrote:

> PS The author of the paper referenced in that StackOverflow article[4],
> John Skilling, was my applied mathematics supervisor in Cambridge. It’s a
> small world.
>
> [4] http://ratml.org/misc/hilbert_curve.pdf <http://ratml.org/misc/
> hilbert_curve.pdf>
>
>
> > On Jun 28, 2017, at 11:42 AM, Atri Sharma <[email protected]> wrote:
> >
> > How do you propose we start here? I can start writing a spec based on
> > the implementation, based on how Postgres and others implement them.
> >
> > On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <[email protected]> wrote:
> >> No, anyone brave enough to be an early adopter would already be on this
> list.
> >>
> >> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill)
> can be proxies for their their communities. And others who are building
> interesting stuff I don’t know about.
> >>
> >> Julian
> >>
> >>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <[email protected]> wrote:
> >>>
> >>> Should we start a thread with potential users, eg Phoenix community?
> >>>
> >>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <[email protected]>
> wrote:
> >>>> I’d also like to hear from potential users of this feature. They
> could try this functionality as it becomes available, and help us
> prioritize features.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <[email protected]>
> wrote:
> >>>>>
> >>>>> And I have created the JIRA:
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/CALCITE-1861
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <[email protected]>
> wrote:
> >>>>>> Is anyone looking for a neat project in Calcite that would have a
> big
> >>>>>> impact? I'm thinking that we could add support for spatial indexes
> to
> >>>>>> Calcite in such a way that downstream projects such as Phoenix and
> >>>>>> Flink could easily benefit from it.
> >>>>>>
> >>>>>> GIS (Geographic Information Systems, aka Spatial database) is really
> >>>>>> useful functionality to have in your database. To find restaurants
> >>>>>> less than 1 km from downtown San Francisco, you could run
> >>>>>>
> >>>>>> select *
> >>>>>> from restaurants as r
> >>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
> >>>>>>
> >>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle
> Spatial
> >>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
> >>>>>> extensions[1].
> >>>>>>
> >>>>>> Now, the SQL-GIS standard is rather large, and involves implementing
> >>>>>> lots of data types and scalar functions. We could get to that
> >>>>>> eventually. But I contend that many, many applications would be
> >>>>>> satisfied by points and distances (like the query above) and a
> spatial
> >>>>>> index to make them run quickly. And I believe that we can add
> spatial
> >>>>>> index support to Calcite using a logical rewrite rule.
> >>>>>>
> >>>>>> Rewriting spatial queries to indexes on space-filling curves is a
> >>>>>> well-established technique [2].
> >>>>>>
> >>>>>> Suppose that the restaurants table, above, had columns latitude and
> >>>>>> longitude and a computed numeric column h = hilbert(latitude,
> >>>>>> longitude). Hilbert curves are space-filling curves such that if two
> >>>>>> points are close in space then their h values will be close. So, if
> >>>>>> there is an index on h, we can find all restaurants close to a given
> >>>>>> point using a range scan of the index.
> >>>>>>
> >>>>>> So, the above query could be rewritten to something like
> >>>>>>
> >>>>>> select *
> >>>>>> from restaurants as r
> >>>>>> where (r.h between 123456 and 123599
> >>>>>>    or r.h between 256789 and 259887)
> >>>>>>  and st_distance_internal(point(-122.4194, 37.7749),
> r.coordinates) <= 1;
> >>>>>>
> >>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in
> the
> >>>>>> database, and the call to st_distance_internal eliminates the
> >>>>>> remaining false positives.
> >>>>>>
> >>>>>> That rewrite can be done using a logical rewrite rule, and the
> >>>>>> resulting query will be faster on just about any database, but
> >>>>>> especially one with key-sorted tables (like Phoenix/HBase) or
> >>>>>> range-partitioned tables. The database does not need to have a
> >>>>>> dedicated "spatial index" data structure.
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>> [1] http://www.opengeospatial.org/standards/sfs
> >>>>>>
> >>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Regards,
> >>>>>
> >>>>> Atri
> >>>>> l'apprenant
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Atri
> >>> l'apprenant
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Atri
> > l'apprenant
>
>

Re: Spatial indexes

Reply via email to