Thanks Mike and Taewoo for your comments. I'll give it a try. Tin
On Mon, Aug 24, 2020 at 12:17 PM Taewoo Kim <[email protected]> wrote: > Agreed with Mike regarding a native approach to spatial joins. :-) I just > tried to point to a location for FuzzyJoinRule. > > Best, > Taewoo > > > On Mon, Aug 24, 2020 at 12:10 PM Mike Carey <[email protected]> wrote: > > > Tin, > > > > 1. There is no support for SQL++ in templates. Just AQL. And AQL is now > > deprecated, in a mode where it is only supported to keep that one fuzzy > > join rule alive via AQL+. The most recent public release of AsterixDB > > is the last one where AQL+ will work - and then AQL will completely > > disappear from the code base (as will the scripting framework) going > > forward, as it is totally unsupported. (Its inventor left with his PhD > > in 2011. Taewoo Kim, below, did a really nice refactoring revision of > > the framework more recently, to remove lots of replicated code between > > AQL and AQL+, but he is also no longer focused on AsterixDB work - his > > post-PhD day job seems to be a distraction. :-)) > > > > 2. More information about the scripting framework and its use for the > > fuzzy join rule is in Chapter 5 of the aforementioned PhD student's > > thesis: > > http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf. > > > > 3. I suspect it would be significantly messy work to create a SQL++ > > alternative to AQL as the basis for the template framework... If you > > want to explore a template-based approach, I'd stick with AQL+ and just > > be aware that it's kind of a "dead end" approach long-term in the > > AsterixDB code base - but could be explored as a research branch using > > 0.9.5 as the branching point. > > > > I think a native approach to spatial joins would be better than a > > template-based approach - along the lines of what Preston Carmon et al > > are doing for interval joins up at Walla Walla U in Washington state. > > > > Cheers, > > > > Mike > > > > On 8/24/20 11:11 AM, Taewoo Kim wrote: > > > Hi Tin, > > > > > > Here is a paper that explains FuzzyJoinRule in general. Hope this > helps. > > > > > > > > > https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG= > > > > > > 1. What is the syntax to make placeholders for input operators and > > > variables in a SQL++ template? In the FuzzyJoinRule > > > < > > > > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > > implementation > > > < > > > > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > >> , > > > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 > for > > > left input variable. Is it similar in SQL++? > > > You need to implement an extended version of SQL++ (maybe SQL+++) that > > > recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+. > > > > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj > > > > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java > > > > > > 2. Does the query template for substitution must be a (full) compilable > > > query? Or it can be just a part of a full query? > > > It has to be a complete query that is executable in SQL+++ (if you > create > > > one). > > > > > > 3. Can the substitute query contain UDF functions? > > > Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an > > extended > > > version. > > > > > > Best, > > > Taewoo > > > > > > > > > On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote: > > > > > >> Hi all, > > >> > > >> I'm working on a project which aims to improve performance of spatial > > join > > >> query in AsterixDB. > > >> > > >> The problem can be described as follows: > > >> > > >> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) > > with > > >> geom is a spatial data type (point, rectangle, polygon). This is a > join > > >> query to list all intersected pairs of these two datasets: > > >> ``` > > >> > > >> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE > > >> spatial_intersect(ps.geom, ls.geom);* > > >> *```* > > >> > > >> I rewrote this query in a more complicated SQL++ query(*) but has a > > better > > >> performance. > > >> > > >> I took a look at FuzzyJoinRule > > >> < > > >> > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > >> implementation > > >> < > > >> > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > >> (a > > >> rule for AQL+) in AsterixDB and I'm trying to make a similar > > implementation > > >> called SpatalJoinRule > > >> < > > >> > > > https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java > > >> (a > > >> rule for SQL++). Now I'm stuck with some questions and I would > > >> really appreciate if you can help me to answer them: > > >> > > >> 1. What is the syntax to make placeholders for input operators and > > >> variables in a SQL++ template? In the FuzzyJoinRule > > >> < > > >> > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > >> implementation > > >> < > > >> > > > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > >>> , > > >> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 > for > > >> left input variable. Is it similar in SQL++? > > >> > > >> 2. Does the query template for substitution must be a (full) > compilable > > >> query? Or it can be just a part of a full query? > > >> > > >> 3. Can the substitute query contain UDF functions? > > >> > > >> Thanks, > > >> > > >> Tin > > >> > > >> (*) Rewitten query: in short, it partitioned the datasets by a grid > (red > > >> statement) then we compute the intersection pairs for each cell of the > > >> grid. > > >> - 'mytileids' UDF function returns the corresponding cell ID of a > > spatial > > >> object. > > >> - 'referencepointtileid' UDF function is used to ignore the > > duplications in > > >> the final result. > > >> - (-180.0,83.0,180.0,90.0) is space MBR. > > >> - (100,100) is grid size (rows x columns). > > >> > > >> ``` > > >> > > >> > > >> > > >> > > >> > > >> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1, > > >> lakesPartitioned.tile as tile2, > > >> test#referencepointtileid(parksPartitioned.geom, > > >> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as > > >> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM > > >> (SELECT > > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps > > FROM > > >> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS > > parksPartitioned,(SELECT > > >> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT > > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps > > FROM > > >> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE > > >> parksPartitioned.tile = lakesPartitioned.tileAND > > >> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS > > result > > >> WHERE result.tile1 = result.ref_tile;* > > >> *```* > > >> > > >
