[ https://issues.apache.org/jira/browse/SPARK-40149 ]
Sean R. Owen deleted comment on SPARK-40149: -------------------------------------- was (Author: JIRAUSER295111): Thank you for sharing the information. [Vlocity Training|https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/] enhances CPQ and guided selling as well. Salesforce Vlocity is the pioneer assisting many tops and arising companies obtain their wanted progress utilizing its Omnichannel procedures. > Star expansion after outer join asymmetrically includes joining key > ------------------------------------------------------------------- > > Key: SPARK-40149 > URL: https://issues.apache.org/jira/browse/SPARK-40149 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 > Reporter: Otakar Truněček > Assignee: Wenchen Fan > Priority: Blocker > Fix For: 3.3.1, 3.2.3, 3.4.0 > > > When star expansion is used on left side of a join, the result will include > joining key, while on the right side of join it doesn't. I would expect the > behaviour to be symmetric (either include on both sides or on neither). > Example: > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as f > spark = SparkSession.builder.getOrCreate() > df_left = spark.range(5).withColumn('val', f.lit('left')) > df_right = spark.range(3, 7).withColumn('val', f.lit('right')) > df_merged = ( > df_left > .alias('left') > .join(df_right.alias('right'), on='id', how='full_outer') > .withColumn('left_all', f.struct('left.*')) > .withColumn('right_all', f.struct('right.*')) > ) > df_merged.show() > {code} > result: > {code:java} > +---+----+-----+------------+---------+ > | id| val| val| left_all|right_all| > +---+----+-----+------------+---------+ > | 0|left| null| {0, left}| {null}| > | 1|left| null| {1, left}| {null}| > | 2|left| null| {2, left}| {null}| > | 3|left|right| {3, left}| {right}| > | 4|left|right| {4, left}| {right}| > | 5|null|right|{null, null}| {right}| > | 6|null|right|{null, null}| {right}| > +---+----+-----+------------+---------+ > {code} > This behaviour started with release 3.2.0. Previously the key was not > included on either side. > Result from Spark 3.1.3 > {code:java} > +---+----+-----+--------+---------+ > | id| val| val|left_all|right_all| > +---+----+-----+--------+---------+ > | 0|left| null| {left}| {null}| > | 6|null|right| {null}| {right}| > | 5|null|right| {null}| {right}| > | 1|left| null| {left}| {null}| > | 3|left|right| {left}| {right}| > | 2|left| null| {left}| {null}| > | 4|left|right| {left}| {right}| > +---+----+-----+--------+---------+ {code} > I have a gut feeling this is related to these issues: > https://issues.apache.org/jira/browse/SPARK-39376 > https://issues.apache.org/jira/browse/SPARK-34527 > https://issues.apache.org/jira/browse/SPARK-38603 > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org