kumarUjjawal commented on issue #20121:
URL: https://github.com/apache/datafusion/issues/20121#issuecomment-3872879635

   I added a couple of small tests to gather evidence:
    • Dictionary DISTINCT: DistinctCountAccumulator currently treats dictionary 
scalars as distinct if they
      differ only by dictionary key width (e.g. Dictionary(Int8, "a") vs 
Dictionary(Int32, "a")), so COUNT(DISTINCT ...)
      would count them separately if mixed widths ever reach the accumulator.
    • REE DISTINCT: same for RunEndEncoded differing only by run-end integer 
width (e.g. Int16 vs Int32) still
      counts as distinct today.
    • Execution realism (Union): I tried to force mixed dictionary key widths 
through a physical UnionExec. Current
      behavior rejects/doesn’t align schemas (it panics internally during 
alignment), so at least via UNION it’s hard for
      mixed key widths to arise under “normal” plan construction.
   
   Net: strictness is observable and can affect DISTINCT semantics if mixed 
widths appear, but some common execution
   paths (like UNION) prevent mixed-width dictionary streams today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to