[ 
https://issues.apache.org/jira/browse/SPARK-57858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57858:
-----------------------------------
    Labels: pull-request-available  (was: )

> Emit BIN BY scaled DISTRIBUTE columns as produced attributes
> ------------------------------------------------------------
>
>                 Key: SPARK-57858
>                 URL: https://issues.apache.org/jira/browse/SPARK-57858
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Nikolina Vraneš
>            Priority: Major
>              Labels: pull-request-available
>
> The BIN BY relation operator proportionally rescales its DISTRIBUTE UNIFORM 
> columns. The logical BinBy node currently carries those columns through 
> child.output with the child's own ExprId, even though execution rewrites 
> their values, which violates Catalyst's invariant that an equal ExprId 
> implies an equal value (no other operator edits a value under a retained 
> child attribute).
>  
> This sub-task makes the rescaled DISTRIBUTE columns produced attributes with 
> fresh ExprIds (same names, types, nullability, and positions), shadowing the 
> inputs, mirroring Generate.generatorOutput. The input columns stay as the 
> operator's read inputs but leave output. ResolveBinBy mints them and 
> DeduplicateRelations renews them across self-joins. Qualifier and metadata 
> are dropped, matching expr AS value computed-value semantics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to