Re: Substitute SQL++ sub-query

Taewoo Kim Mon, 24 Aug 2020 12:17:43 -0700

Agreed with Mike regarding a native approach to spatial joins. :-) I just
tried to point to a location for FuzzyJoinRule.


Best,
Taewoo


On Mon, Aug 24, 2020 at 12:10 PM Mike Carey <[email protected]> wrote:

> Tin,
>
> 1. There is no support for SQL++ in templates. Just AQL.  And AQL is now
> deprecated, in a mode where it is only supported to keep that one fuzzy
> join rule alive via AQL+.  The most recent public release of AsterixDB
> is the last one where AQL+ will work - and then AQL will completely
> disappear from the code base (as will the scripting framework) going
> forward, as it is totally unsupported.  (Its inventor left with his PhD
> in 2011.  Taewoo Kim, below, did a really nice refactoring revision of
> the framework more recently, to remove lots of replicated code between
> AQL and AQL+, but he is also no longer focused on AsterixDB work - his
> post-PhD day job seems to be a distraction. :-))
>
> 2. More information about the scripting framework and its use for the
> fuzzy join rule is in Chapter 5 of the aforementioned PhD student's
> thesis:
> http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf.
>
> 3. I suspect it would be significantly messy work to create a SQL++
> alternative to AQL as the basis for the template framework...  If you
> want to explore a template-based approach, I'd stick with AQL+ and just
> be aware that it's kind of a "dead end" approach long-term in the
> AsterixDB code base - but could be explored as a research branch using
> 0.9.5 as the branching point.
>
> I think a native approach to spatial joins would be better than a
> template-based approach - along the lines of what Preston Carmon et al
> are doing for interval joins up at Walla Walla U in Washington state.
>
> Cheers,
>
> Mike
>
> On 8/24/20 11:11 AM, Taewoo Kim wrote:
> > Hi Tin,
> >
> > Here is a paper that explains FuzzyJoinRule in general. Hope this helps.
> >
> >
> https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=
> >
> > 1. What is the syntax to make placeholders for input operators and
> > variables in a SQL++ template? In the FuzzyJoinRule
> > <
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > implementation
> > <
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> ,
> > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> > left input variable. Is it similar in SQL++?
> > You need to implement an extended version of SQL++ (maybe SQL+++) that
> > recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java
> >
> > 2. Does the query template for substitution must be a (full) compilable
> > query? Or it can be just a part of a full query?
> > It has to be a complete query that is executable in SQL+++ (if you create
> > one).
> >
> > 3. Can the substitute query contain UDF functions?
> > Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an
> extended
> > version.
> >
> > Best,
> > Taewoo
> >
> >
> > On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote:
> >
> >> Hi all,
> >>
> >> I'm working on a project which aims to improve performance of spatial
> join
> >> query in AsterixDB.
> >>
> >> The problem can be described as follows:
> >>
> >> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom)
> with
> >> geom is a spatial data type (point, rectangle, polygon). This is a join
> >> query to list all intersected pairs of these two datasets:
> >> ```
> >>
> >> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
> >> spatial_intersect(ps.geom, ls.geom);*
> >> *```*
> >>
> >> I rewrote this query in a more complicated SQL++ query(*) but has a
> better
> >> performance.
> >>
> >> I took a look at FuzzyJoinRule
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> implementation
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> (a
> >> rule for AQL+) in AsterixDB and I'm trying to make a similar
> implementation
> >> called SpatalJoinRule
> >> <
> >>
> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
> >> (a
> >> rule for SQL++). Now I'm stuck with some questions and I would
> >> really appreciate if you can help me to answer them:
> >>
> >> 1. What is the syntax to make placeholders for input operators and
> >> variables in a SQL++ template? In the FuzzyJoinRule
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> implementation
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >>> ,
> >> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> >> left input variable. Is it similar in SQL++?
> >>
> >> 2. Does the query template for substitution must be a (full) compilable
> >> query? Or it can be just a part of a full query?
> >>
> >> 3. Can the substitute query contain UDF functions?
> >>
> >> Thanks,
> >>
> >> Tin
> >>
> >> (*) Rewitten query: in short, it partitioned the datasets by a grid (red
> >> statement) then we compute the intersection pairs for each cell of the
> >> grid.
> >> - 'mytileids' UDF function returns the corresponding cell ID of a
> spatial
> >> object.
> >> - 'referencepointtileid' UDF function is used to ignore the
> duplications in
> >> the final result.
> >> - (-180.0,83.0,180.0,90.0) is space MBR.
> >> - (100,100) is grid size (rows x columns).
> >>
> >> ```
> >>
> >>
> >>
> >>
> >>
> >> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
> >> lakesPartitioned.tile as tile2,
> >> test#referencepointtileid(parksPartitioned.geom,
> >> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
> >> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
> >> (SELECT
> >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> FROM
> >> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS
> parksPartitioned,(SELECT
> >> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
> >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> FROM
> >> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
> >> parksPartitioned.tile = lakesPartitioned.tileAND
> >> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS
> result
> >> WHERE result.tile1 = result.ref_tile;*
> >> *```*
> >>
>

Re: Substitute SQL++ sub-query

Reply via email to