albertobastos opened a new pull request, #15188:
URL: https://github.com/apache/pinot/pull/15188

   The query plan visualization was incorrect in some cases due to non-unique 
internal ids generated for each node. The outcome was some weird relationships 
between stages that did not match the ones from the textual plan.
   
   This PR addresses this issue taking into account the node parent to create 
an actual unique identifier for the whole tree.
   
   A sample query that got fixed after this change is the next one (run a local 
cluster with the `ColocatedJoinEngineQuickStart` to try it):
   
   ```sql
   SET useSpools = true;
   WITH
       q1 AS (
           SELECT userUUID,
                  deviceOS,
                  totalTrips
           FROM userAttributes
       ),
        q2 AS (
            SELECT userUUID,
                   deviceOS,
                   SUM(totalTrips) AS totalTrips,
                   COUNT(DISTINCT userUUID) AS reach
            FROM q1
            GROUP BY userUUID,
                     deviceOS
        ),
        q3 AS (
            SELECT userUUID,
                   (totalTrips / reach) AS frequency
            FROM q2
        ),
        q4 AS (
            SELECT rd.userUUID,
                   rd.deviceOS,
                   rd.totalTrips as totalTrips,
                   rd.reach AS reach
            FROM q2 rd
        ),
        q5 AS (
            SELECT userUUID,
                   SUM(totalTrips) AS totalTrips
            FROM q4
            GROUP BY userUUID
        ),
        q6 AS (
            SELECT s.userUUID,
                   s.totalTrips,
                   (s.totalTrips / o.frequency) AS reach,
                   'Traditional TV + OTT' AS deviceOS
            FROM q5 s
                     JOIN q3 o ON s.userUUID = o.userUUID
        ),
        q7 AS (
            SELECT rd.userUUID,
                   rd.totalTrips,
                   rd.reach,
                   rd.deviceOS
            FROM q4 rd
            UNION ALL
            SELECT f.userUUID,
                   f.totalTrips,
                   f.reach,
                   f.deviceOS
            FROM q6 f
        ),
        q8 AS (
            SELECT sd.*
            FROM q7 sd
                     JOIN (
                SELECT deviceOS,
                       PERCENTILETDigest(totalTrips, 20) AS p20
                FROM q7
                GROUP BY deviceOS
            ) q ON sd.deviceOS = q.deviceOS
        )
   SELECT *
   FROM q8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to