Hi Tin, Here is a paper that explains FuzzyJoinRule in general. Hope this helps.
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG= 1. What is the syntax to make placeholders for input operators and variables in a SQL++ template? In the FuzzyJoinRule < https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > implementation < https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java >, I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for left input variable. Is it similar in SQL++? You need to implement an extended version of SQL++ (maybe SQL+++) that recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+. https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java 2. Does the query template for substitution must be a (full) compilable query? Or it can be just a part of a full query? It has to be a complete query that is executable in SQL+++ (if you create one). 3. Can the substitute query contain UDF functions? Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an extended version. Best, Taewoo On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote: > Hi all, > > I'm working on a project which aims to improve performance of spatial join > query in AsterixDB. > > The problem can be described as follows: > > Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with > geom is a spatial data type (point, rectangle, polygon). This is a join > query to list all intersected pairs of these two datasets: > ``` > > *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE > spatial_intersect(ps.geom, ls.geom);* > *```* > > I rewrote this query in a more complicated SQL++ query(*) but has a better > performance. > > I took a look at FuzzyJoinRule > < > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > > implementation > < > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > > (a > rule for AQL+) in AsterixDB and I'm trying to make a similar implementation > called SpatalJoinRule > < > https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java > > > (a > rule for SQL++). Now I'm stuck with some questions and I would > really appreciate if you can help me to answer them: > > 1. What is the syntax to make placeholders for input operators and > variables in a SQL++ template? In the FuzzyJoinRule > < > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > > > implementation > < > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java > >, > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for > left input variable. Is it similar in SQL++? > > 2. Does the query template for substitution must be a (full) compilable > query? Or it can be just a part of a full query? > > 3. Can the substitute query contain UDF functions? > > Thanks, > > Tin > > (*) Rewitten query: in short, it partitioned the datasets by a grid (red > statement) then we compute the intersection pairs for each cell of the > grid. > - 'mytileids' UDF function returns the corresponding cell ID of a spatial > object. > - 'referencepointtileid' UDF function is used to ignore the duplications in > the final result. > - (-180.0,83.0,180.0,90.0) is space MBR. > - (100,100) is grid size (rows x columns). > > ``` > > > > > > *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1, > lakesPartitioned.tile as tile2, > test#referencepointtileid(parksPartitioned.geom, > lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as > ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM > (SELECT > test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM > ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT > tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT > test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM > LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE > parksPartitioned.tile = lakesPartitioned.tileAND > spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result > WHERE result.tile1 = result.ref_tile;* > *```* >
