pchintar opened a new pull request, #3969:
URL: https://github.com/apache/datafusion-comet/pull/3969

   ## Which issue does this PR close?
   
   Closes #3841 .
   
   ## Rationale for this change
   This change adds native support for `MAX_BY` and `MIN_BY`.
   
   These aggregates are commonly used in grouped queries. Without native 
support, they fall back, which prevents execution from staying within Comet’s 
aggregation pipeline. This change enables them to run natively and align their 
behavior with Spark.
   
   
   ## What changes are included in this PR?
   * Added a native implementation for `max_by` and `min_by` (`maxmin_by.rs`) 
using a shared design
   
     * maintains the current best `(value, ordering)` pair per group
     * updates and merges state using ordering comparison
     * single-pass execution with constant state per group 
   
   * Implemented `GroupsAccumulator` support to integrate with Comet’s grouped 
aggregation path
   
     * avoids scalar accumulation and per-row overhead
     * includes specialized handling for primitive, byte/string, and struct 
ordering types, with a row-based fallback for general cases
     * enables execution through `CometHashAggregate` for grouped workloads
   
   * Added serialization and planner wiring
   
     * proto definitions for `MaxBy` / `MinBy`
     * Spark-side serde (`CometMaxBy`, `CometMinBy`)
     * registration in `QueryPlanSerde`
     * planner support to construct the native aggregate
   
   * Extended operator support
   
     * enabled execution under `HashAggregate`
     * added support for `SortAggregateExec` where selected by Spark
     * ensured both partial and final aggregation stages execute natively
   
   
   ## How are these changes tested?
   
   Validated against Spark for:
   
   * grouped and non-grouped queries
   * null handling (both ordering and value)
   * tie behavior (equal ordering selects the latest value)
   * struct ordering
   * both hash and sort aggregate plans
   
   Results match Spark semantics, and supported queries execute without any 
fallback.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to