Substitute SQL++ sub-query

Tin Vu Mon, 24 Aug 2020 10:56:43 -0700

Hi all,

I'm working on a project which aims to improve performance of spatial join
query in AsterixDB.


The problem can be described as follows:

Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with
geom is a spatial data type (point, rectangle, polygon). This is a join
query to list all intersected pairs of these two datasets:
```

*SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
spatial_intersect(ps.geom, ls.geom);*
*```*

I rewrote this query in a more complicated SQL++ query(*) but has a better
performance.

I took a look at FuzzyJoinRule
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>
implementation
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>
(a
rule for AQL+) in AsterixDB and I'm trying to make a similar implementation
called SpatalJoinRule
<https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java>
(a
rule for SQL++). Now I'm stuck with some questions and I would
really appreciate if you can help me to answer them:

1. What is the syntax to make placeholders for input operators and
variables in a SQL++ template? In the FuzzyJoinRule
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>
implementation
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>,
I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
left input variable. Is it similar in SQL++?

2. Does the query template for substitution must be a (full) compilable
query? Or it can be just a part of a full query?

3. Can the substitute query contain UDF functions?

Thanks,

Tin

(*) Rewitten query: in short, it partitioned the datasets by a grid (red
statement) then we compute the intersection pairs for each cell of the
grid.
- 'mytileids' UDF function returns the corresponding cell ID of a spatial
object.
- 'referencepointtileid' UDF function is used to ignore the duplications in
the final result.
- (-180.0,83.0,180.0,90.0) is space MBR.
- (100,100) is grid size (rows x columns).

```





*SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
lakesPartitioned.tile as tile2,
test#referencepointtileid(parksPartitioned.geom,
lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT
tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
parksPartitioned.tile = lakesPartitioned.tileAND
spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result
WHERE result.tile1 = result.ref_tile;*
*```*

Substitute SQL++ sub-query

Reply via email to