Hi Wail,
$22 should be a harmless bug -- it's related to the ordering of rules.
For $19: we could potentially have a rule for that.
Best,
Yingyi
On Sat, Jun 24, 2017 at 5:50 PM, Wail Alkowaileet <[email protected]>
wrote:
> Hi Devs,
>
> I have few questions about the query optimizer.
>
> *- Given the query:*
> use dataverse TwitterDataverse
>
> for $x in dataset Tweets
> where $x.name = "trump"
> let $geo := $x.geo
> group by $name:=$x.name with $geo
> return {"name": $name, "geo":$geo[0].coordinates.coordinates}
>
> *- Logical Plan:*
> distribute result [$$10] -- |UNPARTITIONED|
> project ([$$10]) -- |UNPARTITIONED|
> assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
> 0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
> group by ([$$name := $$x.getField("name")]) decor ([]) {
> aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
> nested tuple source -- |UNPARTITIONED|
> } -- |UNPARTITIONED|
> assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
> select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
> unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
> empty-tuple-source -- |UNPARTITIONED|
>
> *- Optimized Logical Plan:*
> distribute result [$$10]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$10])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$10] <- [{"name": $$name, "geo":
> $$19.getField("coordinates")
> }]
> -- ASSIGN |PARTITIONED|
> project ([$$name, $$19])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$19, $$22] <- [get-item($$9,
> 0).getField("coordinates"), get-item($$9,
> 0)]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$name := $$15]) decor ([]) {
> aggregate [$$9] <- [listify($$geo)]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$15] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, $$15)
> -- STABLE_SORT [$$15(ASC)] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$15] |PARTITIONED|
> select (eq($$15, "Alice"))
> -- STREAM_SELECT |PARTITIONED|
> project ([$$geo, $$15])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$geo, $$15] <- [$$x.getField("geo"),
> $$x.getField("name")]
> -- ASSIGN |PARTITIONED|
> project ([$$x])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$16, $$x] <-
> TwitterDataverse.Tweets
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
>
> *- Questions:*
> $$22:
>
> - Why the variable $22 is produced ? Although there is no use for it. Is
> it just a harmless bug or there's some intuition I might be missing?
>
> $$19:
>
> - It seems (sometimes) getField function calls are splitted. Is there a
> reason why is that the case? (There's another example that reproduces
> the
> same behavior)
> - That leads to my next question, I see no rule for "FieldAccessNested"
> which can be exploited here to save few function calls. Can this
> function
> interfere with other functions/access methods?
>
>
> --
>
> *Regards,.*
> Wail Alkowaileet
>