Hi Devs,
I have few questions about the query optimizer.
*- Given the query:*
use dataverse TwitterDataverse
for $x in dataset Tweets
where $x.name = "trump"
let $geo := $x.geo
group by $name:=$x.name with $geo
return {"name": $name, "geo":$geo[0].coordinates.coordinates}
*- Logical Plan:*
distribute result [$$10] -- |UNPARTITIONED|
project ([$$10]) -- |UNPARTITIONED|
assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
group by ([$$name := $$x.getField("name")]) decor ([]) {
aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
nested tuple source -- |UNPARTITIONED|
} -- |UNPARTITIONED|
assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
empty-tuple-source -- |UNPARTITIONED|
*- Optimized Logical Plan:*
distribute result [$$10]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$10])
-- STREAM_PROJECT |PARTITIONED|
assign [$$10] <- [{"name": $$name, "geo": $$19.getField("coordinates")
}]
-- ASSIGN |PARTITIONED|
project ([$$name, $$19])
-- STREAM_PROJECT |PARTITIONED|
assign [$$19, $$22] <- [get-item($$9,
0).getField("coordinates"), get-item($$9,
0)]
-- ASSIGN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
group by ([$$name := $$15]) decor ([]) {
aggregate [$$9] <- [listify($$geo)]
-- AGGREGATE |LOCAL|
nested tuple source
-- NESTED_TUPLE_SOURCE |LOCAL|
}
-- PRE_CLUSTERED_GROUP_BY[$$15] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, $$15)
-- STABLE_SORT [$$15(ASC)] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$15] |PARTITIONED|
select (eq($$15, "Alice"))
-- STREAM_SELECT |PARTITIONED|
project ([$$geo, $$15])
-- STREAM_PROJECT |PARTITIONED|
assign [$$geo, $$15] <- [$$x.getField("geo"),
$$x.getField("name")]
-- ASSIGN |PARTITIONED|
project ([$$x])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$16, $$x] <-
TwitterDataverse.Tweets
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
*- Questions:*
$$22:
- Why the variable $22 is produced ? Although there is no use for it. Is
it just a harmless bug or there's some intuition I might be missing?
$$19:
- It seems (sometimes) getField function calls are splitted. Is there a
reason why is that the case? (There's another example that reproduces the
same behavior)
- That leads to my next question, I see no rule for "FieldAccessNested"
which can be exploited here to save few function calls. Can this function
interfere with other functions/access methods?
--
*Regards,.*
Wail Alkowaileet