Agreed with Mike regarding a native approach to spatial joins. :-) I just tried to point to a location for FuzzyJoinRule.
Best, Taewoo On Mon, Aug 24, 2020 at 12:10 PM Mike Carey <[email protected]> wrote: > Tin, > > 1. There is no support for SQL++ in templates. Just AQL. And AQL is now > deprecated, in a mode where it is only supported to keep that one fuzzy > join rule alive via AQL+. The most recent public release of AsterixDB > is the last one where AQL+ will work - and then AQL will completely > disappear from the code base (as will the scripting framework) going > forward, as it is totally unsupported. (Its inventor left with his PhD > in 2011. Taewoo Kim, below, did a really nice refactoring revision of > the framework more recently, to remove lots of replicated code between > AQL and AQL+, but he is also no longer focused on AsterixDB work - his > post-PhD day job seems to be a distraction. :-)) > > 2. More information about the scripting framework and its use for the > fuzzy join rule is in Chapter 5 of the aforementioned PhD student's > thesis: > http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf. > > 3. I suspect it would be significantly messy work to create a SQL++ > alternative to AQL as the basis for the template framework... If you > want to explore a template-based approach, I'd stick with AQL+ and just > be aware that it's kind of a "dead end" approach long-term in the > AsterixDB code base - but could be explored as a research branch using > 0.9.5 as the branching point. > > I think a native approach to spatial joins would be better than a > template-based approach - along the lines of what Preston Carmon et al > are doing for interval joins up at Walla Walla U in Washington state. > > Cheers, > > Mike > > On 8/24/20 11:11 AM, Taewoo Kim wrote: > > Hi Tin, > > > > Here is a paper that explains FuzzyJoinRule in general. Hope this helps. > > > > > https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG= > > > > 1. What is the syntax to make placeholders for input operators and > > variables in a SQL++ template? In the FuzzyJoinRule > > < > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > implementation > > < > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > >> , > > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for > > left input variable. Is it similar in SQL++? > > You need to implement an extended version of SQL++ (maybe SQL+++) that > > recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+. > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java > > > > 2. Does the query template for substitution must be a (full) compilable > > query? Or it can be just a part of a full query? > > It has to be a complete query that is executable in SQL+++ (if you create > > one). > > > > 3. Can the substitute query contain UDF functions? > > Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an > extended > > version. > > > > Best, > > Taewoo > > > > > > On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote: > > > >> Hi all, > >> > >> I'm working on a project which aims to improve performance of spatial > join > >> query in AsterixDB. > >> > >> The problem can be described as follows: > >> > >> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) > with > >> geom is a spatial data type (point, rectangle, polygon). This is a join > >> query to list all intersected pairs of these two datasets: > >> ``` > >> > >> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE > >> spatial_intersect(ps.geom, ls.geom);* > >> *```* > >> > >> I rewrote this query in a more complicated SQL++ query(*) but has a > better > >> performance. > >> > >> I took a look at FuzzyJoinRule > >> < > >> > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > >> implementation > >> < > >> > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > >> (a > >> rule for AQL+) in AsterixDB and I'm trying to make a similar > implementation > >> called SpatalJoinRule > >> < > >> > https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java > >> (a > >> rule for SQL++). Now I'm stuck with some questions and I would > >> really appreciate if you can help me to answer them: > >> > >> 1. What is the syntax to make placeholders for input operators and > >> variables in a SQL++ template? In the FuzzyJoinRule > >> < > >> > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > >> implementation > >> < > >> > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > >>> , > >> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for > >> left input variable. Is it similar in SQL++? > >> > >> 2. Does the query template for substitution must be a (full) compilable > >> query? Or it can be just a part of a full query? > >> > >> 3. Can the substitute query contain UDF functions? > >> > >> Thanks, > >> > >> Tin > >> > >> (*) Rewitten query: in short, it partitioned the datasets by a grid (red > >> statement) then we compute the intersection pairs for each cell of the > >> grid. > >> - 'mytileids' UDF function returns the corresponding cell ID of a > spatial > >> object. > >> - 'referencepointtileid' UDF function is used to ignore the > duplications in > >> the final result. > >> - (-180.0,83.0,180.0,90.0) is space MBR. > >> - (100,100) is grid size (rows x columns). > >> > >> ``` > >> > >> > >> > >> > >> > >> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1, > >> lakesPartitioned.tile as tile2, > >> test#referencepointtileid(parksPartitioned.geom, > >> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as > >> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM > >> (SELECT > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps > FROM > >> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS > parksPartitioned,(SELECT > >> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps > FROM > >> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE > >> parksPartitioned.tile = lakesPartitioned.tileAND > >> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS > result > >> WHERE result.tile1 = result.ref_tile;* > >> *```* > >> >
