[
https://issues.apache.org/jira/browse/SPARK-57858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57858:
-----------------------------------
Labels: pull-request-available (was: )
> Emit BIN BY scaled DISTRIBUTE columns as produced attributes
> ------------------------------------------------------------
>
> Key: SPARK-57858
> URL: https://issues.apache.org/jira/browse/SPARK-57858
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Nikolina Vraneš
> Priority: Major
> Labels: pull-request-available
>
> The BIN BY relation operator proportionally rescales its DISTRIBUTE UNIFORM
> columns. The logical BinBy node currently carries those columns through
> child.output with the child's own ExprId, even though execution rewrites
> their values, which violates Catalyst's invariant that an equal ExprId
> implies an equal value (no other operator edits a value under a retained
> child attribute).
>
> This sub-task makes the rescaled DISTRIBUTE columns produced attributes with
> fresh ExprIds (same names, types, nullability, and positions), shadowing the
> inputs, mirroring Generate.generatorOutput. The input columns stay as the
> operator's read inputs but leave output. ResolveBinBy mints them and
> DeduplicateRelations renews them across self-joins. Qualifier and metadata
> are dropped, matching expr AS value computed-value semantics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]