Re: Substitute SQL++ sub-query

Mike Carey Mon, 24 Aug 2020 12:10:45 -0700

Tin,

1. There is no support for SQL++ in templates. Just AQL. And AQL is nowdeprecated, in a mode where it is only supported to keep that one fuzzyjoin rule alive via AQL+. The most recent public release of AsterixDBis the last one where AQL+ will work - and then AQL will completelydisappear from the code base (as will the scripting framework) goingforward, as it is totally unsupported. (Its inventor left with his PhDin 2011. Taewoo Kim, below, did a really nice refactoring revision ofthe framework more recently, to remove lots of replicated code betweenAQL and AQL+, but he is also no longer focused on AsterixDB work - hispost-PhD day job seems to be a distraction. :-))

2. More information about the scripting framework and its use for thefuzzy join rule is in Chapter 5 of the aforementioned PhD student'sthesis:http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf.

3. I suspect it would be significantly messy work to create a SQL++alternative to AQL as the basis for the template framework... If youwant to explore a template-based approach, I'd stick with AQL+ and justbe aware that it's kind of a "dead end" approach long-term in theAsterixDB code base - but could be explored as a research branch using0.9.5 as the branching point.

I think a native approach to spatial joins would be better than atemplate-based approach - along the lines of what Preston Carmon et alare doing for interval joins up at Walla Walla U in Washington state.


Cheers,

Mike

On 8/24/20 11:11 AM, Taewoo Kim wrote:

Hi Tin,

Here is a paper that explains FuzzyJoinRule in general. Hope this helps.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=

1. What is the syntax to make placeholders for input operators and
variables in a SQL++ template? In the FuzzyJoinRule
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
implementation
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java

I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
left input variable. Is it similar in SQL++?
You need to implement an extended version of SQL++ (maybe SQL+++) that
recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java

2. Does the query template for substitution must be a (full) compilable
query? Or it can be just a part of a full query?
It has to be a complete query that is executable in SQL+++ (if you create
one).

3. Can the substitute query contain UDF functions?
Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an extended
version.

Best,
Taewoo


On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <[email protected]> wrote:

Hi all,

I'm working on a project which aims to improve performance of spatial join
query in AsterixDB.

The problem can be described as follows:

Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with
geom is a spatial data type (point, rectangle, polygon). This is a join
query to list all intersected pairs of these two datasets:
```

*SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
spatial_intersect(ps.geom, ls.geom);*
*```*

I rewrote this query in a more complicated SQL++ query(*) but has a better
performance.

I took a look at FuzzyJoinRule
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
implementation
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
(a
rule for AQL+) in AsterixDB and I'm trying to make a similar implementation
called SpatalJoinRule
<
https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
(a
rule for SQL++). Now I'm stuck with some questions and I would
really appreciate if you can help me to answer them:

1. What is the syntax to make placeholders for input operators and
variables in a SQL++ template? In the FuzzyJoinRule
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
implementation
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java

I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
left input variable. Is it similar in SQL++?

2. Does the query template for substitution must be a (full) compilable
query? Or it can be just a part of a full query?

3. Can the substitute query contain UDF functions?

Thanks,

Tin

(*) Rewitten query: in short, it partitioned the datasets by a grid (red
statement) then we compute the intersection pairs for each cell of the
grid.
- 'mytileids' UDF function returns the corresponding cell ID of a spatial
object.
- 'referencepointtileid' UDF function is used to ignore the duplications in
the final result.
- (-180.0,83.0,180.0,90.0) is space MBR.
- (100,100) is grid size (rows x columns).

```





*SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
lakesPartitioned.tile as tile2,
test#referencepointtileid(parksPartitioned.geom,
lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
(SELECT
test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT
tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
parksPartitioned.tile = lakesPartitioned.tileAND
spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result
WHERE result.tile1 = result.ref_tile;*
*```*

Re: Substitute SQL++ sub-query

Reply via email to