Jefffrey commented on issue #20121: URL: https://github.com/apache/datafusion/issues/20121#issuecomment-3877655116
> I added a couple of small tests to gather evidence: • Dictionary DISTINCT: DistinctCountAccumulator currently treats dictionary scalars as distinct if they differ only by dictionary key width (e.g. Dictionary(Int8, "a") vs Dictionary(Int32, "a")), so COUNT(DISTINCT ...) would count them separately if mixed widths ever reach the accumulator. • REE DISTINCT: same for RunEndEncoded differing only by run-end integer width (e.g. Int16 vs Int32) still counts as distinct today. • Execution realism (Union): I tried to force mixed dictionary key widths through a physical UnionExec. Current behavior rejects/doesn’t align schemas (it panics internally during alignment), so at least via UNION it’s hard for mixed key widths to arise under “normal” plan construction. > > Net: strictness is observable and can affect DISTINCT semantics if mixed widths appear, but some common execution paths (like UNION) prevent mixed-width dictionary streams today. These usecases sound like they should have been handled by type coercion indeed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
