kumarUjjawal commented on issue #20121:
URL: https://github.com/apache/datafusion/issues/20121#issuecomment-3872879635
I added a couple of small tests to gather evidence:
• Dictionary DISTINCT: DistinctCountAccumulator currently treats dictionary
scalars as distinct if they
differ only by dictionary key width (e.g. Dictionary(Int8, "a") vs
Dictionary(Int32, "a")), so COUNT(DISTINCT ...)
would count them separately if mixed widths ever reach the accumulator.
• REE DISTINCT: same for RunEndEncoded differing only by run-end integer
width (e.g. Int16 vs Int32) still
counts as distinct today.
• Execution realism (Union): I tried to force mixed dictionary key widths
through a physical UnionExec. Current
behavior rejects/doesn’t align schemas (it panics internally during
alignment), so at least via UNION it’s hard for
mixed key widths to arise under “normal” plan construction.
Net: strictness is observable and can affect DISTINCT semantics if mixed
widths appear, but some common execution
paths (like UNION) prevent mixed-width dictionary streams today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]