[ 
https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776783#comment-17776783
 ] 

Bruce Robbins commented on SPARK-45583:
---------------------------------------

Strangely, I cannot reproduce. Is some setting required?
{noformat}
sql("select version()").show(false)
+----------------------------------------------+
|version()                                     |
+----------------------------------------------+
|3.5.0 ce5ddad990373636e94071e7cef2f31021add07b|
+----------------------------------------------+

scala> sql("""WITH people as (
  SELECT * FROM (VALUES 
    (1, 'Peter'), 
    (2, 'Homer'), 
    (3, 'Ned'),
    (3, 'Jenny')
  ) AS Idiots(id, FirstName)
), location as (
  SELECT * FROM (VALUES
    (1, 'sample0'),
    (1, 'sample1'),
    (2, 'sample2')  
  ) as Locations(id, address)
)SELECT
  *
FROM
  people
FULL OUTER JOIN
  location
ON
  people.id = location.id""").show(false)
     |      |      |      |      |      |      |      |      |      |      |    
  |      |      |      |      |      |      |      |      | 
+---+---------+----+-------+
|id |FirstName|id  |address|
+---+---------+----+-------+
|1  |Peter    |1   |sample0|
|1  |Peter    |1   |sample1|
|2  |Homer    |2   |sample2|
|3  |Ned      |NULL|NULL   |
|3  |Jenny    |NULL|NULL   |
+---+---------+----+-------+

scala> 
{noformat}

> Spark SQL returning incorrect values for full outer join on keys with the 
> same name.
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-45583
>                 URL: https://issues.apache.org/jira/browse/SPARK-45583
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Huw
>            Priority: Major
>
> {{The following query gives the wrong results.}}
>  
> {{WITH people as (}}
> {{  SELECT * FROM (VALUES }}
> {{    (1, 'Peter'), }}
> {{    (2, 'Homer'), }}
> {{    (3, 'Ned'),}}
> {{    (3, 'Jenny')}}
> {{  ) AS Idiots(id, FirstName)}}
> {{{}){}}}{{{}, location as ({}}}
> {{  SELECT * FROM (VALUES}}
> {{    (1, 'sample0'),}}
> {{    (1, 'sample1'),}}
> {{    (2, 'sample2')  }}
> {{  ) as Locations(id, address)}}
> {{{}){}}}{{{}SELECT{}}}
> {{  *}}
> {{FROM}}
> {{  people}}
> {{FULL OUTER JOIN}}
> {{  location}}
> {{ON}}
> {{  people.id = location.id}}
> {{We find the following table:}}
> ||id: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |null|Ned|null|null|
> |null|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
> {{But clearly the first `id` column is wrong, the nulls should be 3.}}
> If we rename the id column in (only) the person table to pid we get the 
> correct results:
> ||pid: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |3|Ned|null|null|
> |3|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to