Thanks Mike and Taewoo for your comments. I'll give it a try.

Tin

On Mon, Aug 24, 2020 at 12:17 PM Taewoo Kim <[email protected]> wrote:

> Agreed with Mike regarding a native approach to spatial joins. :-) I just
> tried to point to a location for FuzzyJoinRule.
>
> Best,
> Taewoo
>
>
> On Mon, Aug 24, 2020 at 12:10 PM Mike Carey <[email protected]> wrote:
>
> > Tin,
> >
> > 1. There is no support for SQL++ in templates. Just AQL.  And AQL is now
> > deprecated, in a mode where it is only supported to keep that one fuzzy
> > join rule alive via AQL+.  The most recent public release of AsterixDB
> > is the last one where AQL+ will work - and then AQL will completely
> > disappear from the code base (as will the scripting framework) going
> > forward, as it is totally unsupported.  (Its inventor left with his PhD
> > in 2011.  Taewoo Kim, below, did a really nice refactoring revision of
> > the framework more recently, to remove lots of replicated code between
> > AQL and AQL+, but he is also no longer focused on AsterixDB work - his
> > post-PhD day job seems to be a distraction. :-))
> >
> > 2. More information about the scripting framework and its use for the
> > fuzzy join rule is in Chapter 5 of the aforementioned PhD student's
> > thesis:
> > http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf.
> >
> > 3. I suspect it would be significantly messy work to create a SQL++
> > alternative to AQL as the basis for the template framework...  If you
> > want to explore a template-based approach, I'd stick with AQL+ and just
> > be aware that it's kind of a "dead end" approach long-term in the
> > AsterixDB code base - but could be explored as a research branch using
> > 0.9.5 as the branching point.
> >
> > I think a native approach to spatial joins would be better than a
> > template-based approach - along the lines of what Preston Carmon et al
> > are doing for interval joins up at Walla Walla U in Washington state.
> >
> > Cheers,
> >
> > Mike
> >
> > On 8/24/20 11:11 AM, Taewoo Kim wrote:
> > > Hi Tin,
> > >
> > > Here is a paper that explains FuzzyJoinRule in general. Hope this
> helps.
> > >
> > >
> >
> https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=
> > >
> > > 1. What is the syntax to make placeholders for input operators and
> > > variables in a SQL++ template? In the FuzzyJoinRule
> > > <
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > > implementation
> > > <
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> ,
> > > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0
> for
> > > left input variable. Is it similar in SQL++?
> > > You need to implement an extended version of SQL++ (maybe SQL+++) that
> > > recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java
> > >
> > > 2. Does the query template for substitution must be a (full) compilable
> > > query? Or it can be just a part of a full query?
> > > It has to be a complete query that is executable in SQL+++ (if you
> create
> > > one).
> > >
> > > 3. Can the substitute query contain UDF functions?
> > > Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an
> > extended
> > > version.
> > >
> > > Best,
> > > Taewoo
> > >
> > >
> > > On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm working on a project which aims to improve performance of spatial
> > join
> > >> query in AsterixDB.
> > >>
> > >> The problem can be described as follows:
> > >>
> > >> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom)
> > with
> > >> geom is a spatial data type (point, rectangle, polygon). This is a
> join
> > >> query to list all intersected pairs of these two datasets:
> > >> ```
> > >>
> > >> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
> > >> spatial_intersect(ps.geom, ls.geom);*
> > >> *```*
> > >>
> > >> I rewrote this query in a more complicated SQL++ query(*) but has a
> > better
> > >> performance.
> > >>
> > >> I took a look at FuzzyJoinRule
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> implementation
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> (a
> > >> rule for AQL+) in AsterixDB and I'm trying to make a similar
> > implementation
> > >> called SpatalJoinRule
> > >> <
> > >>
> >
> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
> > >> (a
> > >> rule for SQL++). Now I'm stuck with some questions and I would
> > >> really appreciate if you can help me to answer them:
> > >>
> > >> 1. What is the syntax to make placeholders for input operators and
> > >> variables in a SQL++ template? In the FuzzyJoinRule
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> implementation
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >>> ,
> > >> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0
> for
> > >> left input variable. Is it similar in SQL++?
> > >>
> > >> 2. Does the query template for substitution must be a (full)
> compilable
> > >> query? Or it can be just a part of a full query?
> > >>
> > >> 3. Can the substitute query contain UDF functions?
> > >>
> > >> Thanks,
> > >>
> > >> Tin
> > >>
> > >> (*) Rewitten query: in short, it partitioned the datasets by a grid
> (red
> > >> statement) then we compute the intersection pairs for each cell of the
> > >> grid.
> > >> - 'mytileids' UDF function returns the corresponding cell ID of a
> > spatial
> > >> object.
> > >> - 'referencepointtileid' UDF function is used to ignore the
> > duplications in
> > >> the final result.
> > >> - (-180.0,83.0,180.0,90.0) is space MBR.
> > >> - (100,100) is grid size (rows x columns).
> > >>
> > >> ```
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
> > >> lakesPartitioned.tile as tile2,
> > >> test#referencepointtileid(parksPartitioned.geom,
> > >> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
> > >> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
> > >> (SELECT
> > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> > FROM
> > >> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS
> > parksPartitioned,(SELECT
> > >> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
> > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> > FROM
> > >> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
> > >> parksPartitioned.tile = lakesPartitioned.tileAND
> > >> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS
> > result
> > >> WHERE result.tile1 = result.ref_tile;*
> > >> *```*
> > >>
> >
>

Reply via email to