SubhamSinghal opened a new pull request, #22719:
URL: https://github.com/apache/datafusion/pull/22719

   ### Which issue does this PR close?
   
     - Closes #22667.
   
     ### Rationale for this change
     ### What changes are included in this PR?
   
     - DistinctArrayAggAccumulator state: HashSet<ScalarValue> → 
HashMap<ScalarValue, u64>.
     - update_batch: increments the per-value count instead of inserting.
     - New retract_batch: decrements, removes the key on zero, mirrors the 
update_batch null-handling
     rules (ignore_nulls skip, otherwise NULL is a tracked key).
     - supports_retract_batch() now returns true.
     - merge_batch is structurally unchanged — the wire state (List<value>) 
carries presence, not
     multiplicities. Merged counts represent "partitions that emitted this 
value," which is fine because
     evaluate only reads keys. Refcount semantics are only relied on within a 
single accumulator instance
     (window execution, which doesn't merge).
     - New helper ScalarValue::size_of_hashmap<V, S> in datafusion-common, 
mirroring size_of_hashset.
   
     ### Are these changes tested?
   
     Yes
   
     ### Are there any user-facing changes?
   
     Yes. array_agg(DISTINCT x) now works in bounded/sliding window frames. 
Queries that previously
     errored now succeed:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to