albertobastos opened a new pull request, #15188:
URL: https://github.com/apache/pinot/pull/15188
The query plan visualization was incorrect in some cases due to non-unique
internal ids generated for each node. The outcome was some weird relationships
between stages that did not match the ones from the textual plan.
This PR addresses this issue taking into account the node parent to create
an actual unique identifier for the whole tree.
A sample query that got fixed after this change is the next one (run a local
cluster with the `ColocatedJoinEngineQuickStart` to try it):
```sql
SET useSpools = true;
WITH
q1 AS (
SELECT userUUID,
deviceOS,
totalTrips
FROM userAttributes
),
q2 AS (
SELECT userUUID,
deviceOS,
SUM(totalTrips) AS totalTrips,
COUNT(DISTINCT userUUID) AS reach
FROM q1
GROUP BY userUUID,
deviceOS
),
q3 AS (
SELECT userUUID,
(totalTrips / reach) AS frequency
FROM q2
),
q4 AS (
SELECT rd.userUUID,
rd.deviceOS,
rd.totalTrips as totalTrips,
rd.reach AS reach
FROM q2 rd
),
q5 AS (
SELECT userUUID,
SUM(totalTrips) AS totalTrips
FROM q4
GROUP BY userUUID
),
q6 AS (
SELECT s.userUUID,
s.totalTrips,
(s.totalTrips / o.frequency) AS reach,
'Traditional TV + OTT' AS deviceOS
FROM q5 s
JOIN q3 o ON s.userUUID = o.userUUID
),
q7 AS (
SELECT rd.userUUID,
rd.totalTrips,
rd.reach,
rd.deviceOS
FROM q4 rd
UNION ALL
SELECT f.userUUID,
f.totalTrips,
f.reach,
f.deviceOS
FROM q6 f
),
q8 AS (
SELECT sd.*
FROM q7 sd
JOIN (
SELECT deviceOS,
PERCENTILETDigest(totalTrips, 20) AS p20
FROM q7
GROUP BY deviceOS
) q ON sd.deviceOS = q.deviceOS
)
SELECT *
FROM q8
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]