[ https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776783#comment-17776783 ]
Bruce Robbins commented on SPARK-45583: --------------------------------------- Strangely, I cannot reproduce. Is some setting required? {noformat} sql("select version()").show(false) +----------------------------------------------+ |version() | +----------------------------------------------+ |3.5.0 ce5ddad990373636e94071e7cef2f31021add07b| +----------------------------------------------+ scala> sql("""WITH people as ( SELECT * FROM (VALUES (1, 'Peter'), (2, 'Homer'), (3, 'Ned'), (3, 'Jenny') ) AS Idiots(id, FirstName) ), location as ( SELECT * FROM (VALUES (1, 'sample0'), (1, 'sample1'), (2, 'sample2') ) as Locations(id, address) )SELECT * FROM people FULL OUTER JOIN location ON people.id = location.id""").show(false) | | | | | | | | | | | | | | | | | | | | +---+---------+----+-------+ |id |FirstName|id |address| +---+---------+----+-------+ |1 |Peter |1 |sample0| |1 |Peter |1 |sample1| |2 |Homer |2 |sample2| |3 |Ned |NULL|NULL | |3 |Jenny |NULL|NULL | +---+---------+----+-------+ scala> {noformat} > Spark SQL returning incorrect values for full outer join on keys with the > same name. > ------------------------------------------------------------------------------------ > > Key: SPARK-45583 > URL: https://issues.apache.org/jira/browse/SPARK-45583 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Reporter: Huw > Priority: Major > > {{The following query gives the wrong results.}} > > {{WITH people as (}} > {{ SELECT * FROM (VALUES }} > {{ (1, 'Peter'), }} > {{ (2, 'Homer'), }} > {{ (3, 'Ned'),}} > {{ (3, 'Jenny')}} > {{ ) AS Idiots(id, FirstName)}} > {{{}){}}}{{{}, location as ({}}} > {{ SELECT * FROM (VALUES}} > {{ (1, 'sample0'),}} > {{ (1, 'sample1'),}} > {{ (2, 'sample2') }} > {{ ) as Locations(id, address)}} > {{{}){}}}{{{}SELECT{}}} > {{ *}} > {{FROM}} > {{ people}} > {{FULL OUTER JOIN}} > {{ location}} > {{ON}} > {{ people.id = location.id}} > {{We find the following table:}} > ||id: integer||FirstName: string||id: integer||address: string|| > |2|Homer|2|sample2| > |null|Ned|null|null| > |null|Jenny|null|null| > |1|Peter|1|sample0| > |1|Peter|1|sample1| > {{But clearly the first `id` column is wrong, the nulls should be 3.}} > If we rename the id column in (only) the person table to pid we get the > correct results: > ||pid: integer||FirstName: string||id: integer||address: string|| > |2|Homer|2|sample2| > |3|Ned|null|null| > |3|Jenny|null|null| > |1|Peter|1|sample0| > |1|Peter|1|sample1| -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org