andygrove opened a new pull request, #3397:
URL: https://github.com/apache/datafusion-comet/pull/3397

   ## Summary
   
   This PR fixes core correctness issues with windowed aggregate queries by 
adding an explicit `SortExec` before `BoundedWindowAggExec` when ORDER BY is 
present.
   
   **Tracking Issue:** #2721
   
   ## Changes
   
   1. **Add explicit SortExec** (`planner.rs`) - Insert sort before 
`BoundedWindowAggExec` when ORDER BY is present, ensuring 
`InputOrderMode::Sorted` requirement is satisfied
   
   2. **Improve support level detection** (`CometWindowExec.scala`) - Change 
from blanket `Incompatible` to `Compatible` for valid cases, with proper 
validation that partition expressions must be a subset of order expressions
   
   3. **Disable by default** (`CometConf.scala`) - Set 
`spark.comet.exec.window.enabled=false` to avoid breaking changes; users can 
opt-in to test
   
   ## What's Now Supported (when enabled)
   
   - Window aggregates: `COUNT`, `SUM`, `MIN`, `MAX`
   - `OVER()` - no partition, no order
   - `OVER(ORDER BY x)` - order only
   - `OVER(PARTITION BY x)` - partition only
   - `OVER(PARTITION BY x ORDER BY x, y)` - partition is subset of order
   
   ## What's NOT Supported (falls back to Spark)
   
   - `PARTITION BY a ORDER BY b` where partition columns differ from order 
columns
   - `AVG` window aggregate (native implementation has known issues)
   - Ranking functions: `ROW_NUMBER`, `RANK`, `DENSE_RANK`, etc.
   - Offset functions: `LAG`, `LEAD`
   - Value functions: `FIRST_VALUE`, `LAST_VALUE`, `NTH_VALUE`
   - `RANGE BETWEEN` with numeric/temporal expressions (#1246)
   
   ## Test Plan
   
   - [x] All existing window tests pass (14 tests)
   - [x] Enabled "aggregate window function for all types" test that was 
previously ignored
   - [x] Added new tests for partition-subset-of-order validation
   - [x] No golden file updates needed (feature disabled by default)
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to