Jefffrey commented on issue #20121:
URL: https://github.com/apache/datafusion/issues/20121#issuecomment-3877655116

   > I added a couple of small tests to gather evidence: • Dictionary DISTINCT: 
DistinctCountAccumulator currently treats dictionary scalars as distinct if 
they differ only by dictionary key width (e.g. Dictionary(Int8, "a") vs 
Dictionary(Int32, "a")), so COUNT(DISTINCT ...) would count them separately if 
mixed widths ever reach the accumulator. • REE DISTINCT: same for RunEndEncoded 
differing only by run-end integer width (e.g. Int16 vs Int32) still counts as 
distinct today. • Execution realism (Union): I tried to force mixed dictionary 
key widths through a physical UnionExec. Current behavior rejects/doesn’t align 
schemas (it panics internally during alignment), so at least via UNION it’s 
hard for mixed key widths to arise under “normal” plan construction.
   > 
   > Net: strictness is observable and can affect DISTINCT semantics if mixed 
widths appear, but some common execution paths (like UNION) prevent mixed-width 
dictionary streams today.
   
   These usecases sound like they should have been handled by type coercion 
indeed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to