Tamar-Posen opened a new issue, #18850:
URL: https://github.com/apache/datafusion/issues/18850

   ### Describe the bug
   
   In DataFusion 51.1.0, joins involving a `GROUP BY` may choose the wrong 
build/probe side.
   This happens because `AggregateExec` discards `total_byte_size` statistics, 
leaving only row-count stats.
   The join-selection rule then falls back to comparing `num_rows`, which 
producing incorrect decisions for tables with uneven byte-size distributions
   
   As a result, dynamic filters may be:
   - Built on the large table, or
   - Not generated at all.
   
   ### To Reproduce
   
   This occurs when joining tables with extreme size asymmetry:
   
   - large_bytes: ~50 rows, ~1 GB total
   - many_rows: ~1,000,000 rows, ~50 KB total
   
   Run:
   ```
   SELECT *
   FROM large_bytes
   JOIN (
     SELECT id, join_key
     FROM many_rows
     GROUP BY ALL
   ) AS many_rows
   ON large_bytes.id = many_rows.join_key;
   
   ```
   
   Inspect the optimized plan (`EXPLAIN` or `df.explain(false, true)`).
   The optimizer may select `large_bytes` as the build side because the 
aggregated `many_rows` loses its byte-size stats.
   
   Minimal Rust test reproducing the issue: 
[repro_aggregate_bytes.rs](https://gist.github.com/Tamar-Posen/758bca567c9a591e0b9e5d0a1fa53633)
   
   ### Expected behavior
   
   - AggregateExec should preserve or approximate total_byte_size.
   - Join selection should use byte-size when available.
   - Dynamic filters should consistently build on the smaller table.
   
   ### Additional context
   
   Relevant areas:
   
   - physical-plan/aggregates/mod.rs – statistics propagation
   - physical-optimizer/join_selection.rs
   - physical-optimizer/filter_pushdown.rs
   - physical-plan/joins/hash_join/exec.rs
   
   This behavior appears to be a limitation (design choice) rather than an 
incidental bug. Preserving or estimating byte-size stats after aggregation 
would improve join selection for skewed row-size distributions
   
   I’m happy to submit a PR after discussing the desired approach for 
propagating or estimating byte-size stats through AggregateExec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to