nathanb9 commented on PR #23175:
URL: https://github.com/apache/datafusion/pull/23175#issuecomment-4806406668

   # Query / Computation Fusion Optimizer Ideas
   
   Inspired by Athena's computation-reuse fusion paper.
   
   ## 1. Scalar Aggregate Fusion
   
   Merge multiple scalar aggregate subqueries over the same source into one 
aggregate with `FILTER` clauses.
   
   Example query: TPC-DS Q9
   
   Before:
   
   ```sql
   SELECT
     (SELECT avg(x) FROM R WHERE p1),
     (SELECT avg(y) FROM R WHERE p2);
   ```
   
   After:
   
   ```sql
   SELECT
     avg(x) FILTER (WHERE p1),
     avg(y) FILTER (WHERE p2)
   FROM R;
   ```
   
   ## 2. Grouped Aggregate Join To Window Aggregate
   
   Replace a join back to a grouped aggregate over the same source with a 
window aggregate.
   
   Example query: TPC-DS Q65
   
   Before:
   
   ```sql
   SELECT R.*, A.avg_v
   FROM R
   JOIN (
     SELECT k, avg(v) AS avg_v
     FROM R
     GROUP BY k
   ) A
   ON R.k = A.k;
   ```
   
   After:
   
   ```sql
   SELECT
     R.*,
     avg(v) OVER (PARTITION BY k) AS avg_v
   FROM R
   WHERE k IS NOT NULL;
   ```
   
   ## 3. UNION ALL Branch Fusion
   
   Merge same-source `UNION ALL` branches into one scan plus branch tags and 
branch-specific predicates.
   
   Example query: TPC-DS Q23-style shapes
   
   Before:
   
   ```sql
   SELECT ... FROM R WHERE p1
   UNION ALL
   SELECT ... FROM R WHERE p2;
   ```
   
   After:
   
   ```sql
   SELECT ...
   FROM R
   CROSS JOIN branches
   WHERE (branch = 1 AND p1)
      OR (branch = 2 AND p2);
   ```
   
   ## 4. Multi-Aggregate Common-Source Fusion
   
   Merge sibling aggregates over the same source and group keys into one 
aggregate with multiple aggregate expressions.
   
   Example queries: TPC-DS Q28 / Q88
   
   Before:
   
   ```sql
   SELECT ...
   FROM (
     SELECT k, agg1(v) FROM R GROUP BY k
   ) a,
   (
     SELECT k, agg2(v) FROM R GROUP BY k
   ) b;
   ```
   
   After:
   
   ```sql
   SELECT
     k,
     agg1(v),
     agg2(v)
   FROM R
   GROUP BY k;
   ```
   
   Note: Q28-style cases may need `DISTINCT` / `MarkDistinct` handling.
   
   ## 5. Common Join-Input Fusion
   
   Detect joins or subqueries that repeatedly consume the same source and share 
the common input instead of recomputing it.
   
   Example queries: TPC-DS Q64 / Q88
   
   Before:
   
   ```sql
   SELECT ...
   FROM (R JOIN A) x,
        (R JOIN B) y;
   ```
   
   After:
   
   ```sql
   SELECT ...
   FROM R
   JOIN A
   JOIN B;
   ```
   
   This should start conservatively behind a config flag and require semantic 
guards for `NULL`s, `DISTINCT`, outer joins, limits, ordering, and volatile 
expressions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to