Hi Tin,

Here is a paper that explains FuzzyJoinRule in general. Hope this helps.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=

1. What is the syntax to make placeholders for input operators and
variables in a SQL++ template? In the FuzzyJoinRule
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>
implementation
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>,
I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
left input variable. Is it similar in SQL++?
You need to implement an extended version of SQL++ (maybe SQL+++) that
recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java

2. Does the query template for substitution must be a (full) compilable
query? Or it can be just a part of a full query?
It has to be a complete query that is executable in SQL+++ (if you create
one).

3. Can the substitute query contain UDF functions?
Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an extended
version.

Best,
Taewoo


On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote:

> Hi all,
>
> I'm working on a project which aims to improve performance of spatial join
> query in AsterixDB.
>
> The problem can be described as follows:
>
> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with
> geom is a spatial data type (point, rectangle, polygon). This is a join
> query to list all intersected pairs of these two datasets:
> ```
>
> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
> spatial_intersect(ps.geom, ls.geom);*
> *```*
>
> I rewrote this query in a more complicated SQL++ query(*) but has a better
> performance.
>
> I took a look at FuzzyJoinRule
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >
> implementation
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >
> (a
> rule for AQL+) in AsterixDB and I'm trying to make a similar implementation
> called SpatalJoinRule
> <
> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
> >
> (a
> rule for SQL++). Now I'm stuck with some questions and I would
> really appreciate if you can help me to answer them:
>
> 1. What is the syntax to make placeholders for input operators and
> variables in a SQL++ template? In the FuzzyJoinRule
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >
> implementation
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >,
> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> left input variable. Is it similar in SQL++?
>
> 2. Does the query template for substitution must be a (full) compilable
> query? Or it can be just a part of a full query?
>
> 3. Can the substitute query contain UDF functions?
>
> Thanks,
>
> Tin
>
> (*) Rewitten query: in short, it partitioned the datasets by a grid (red
> statement) then we compute the intersection pairs for each cell of the
> grid.
> - 'mytileids' UDF function returns the corresponding cell ID of a spatial
> object.
> - 'referencepointtileid' UDF function is used to ignore the duplications in
> the final result.
> - (-180.0,83.0,180.0,90.0) is space MBR.
> - (100,100) is grid size (rows x columns).
>
> ```
>
>
>
>
>
> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
> lakesPartitioned.tile as tile2,
> test#referencepointtileid(parksPartitioned.geom,
> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
> (SELECT
> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT
> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
> parksPartitioned.tile = lakesPartitioned.tileAND
> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result
> WHERE result.tile1 = result.ref_tile;*
> *```*
>

Reply via email to