[
https://issues.apache.org/jira/browse/SPARK-10967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
RaviShankar KS updated SPARK-10967:
-----------------------------------
Environment: RHEL (was: Ubuntu on AWS)
> Incorrect Join behavior in filter conditions
> --------------------------------------------
>
> Key: SPARK-10967
> URL: https://issues.apache.org/jira/browse/SPARK-10967
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 1.4.1
> Environment: RHEL
> Reporter: RaviShankar KS
> Assignee: Josh Rosen
> Labels: sql, union
> Fix For: 1.5.0
>
>
> We notice that the join conditions are not working as expected in the case of
> nested columns being compared.
> As long as leaf column names are same under nested columns, should order
> matter ??
> Consider below example for two data frames d5 and d5_opp :
> d5.printSchema
> root
> |-- key: integer (nullable = false)
> |-- value: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- col1: string (nullable = true)
> | | |-- col2: string (nullable = true)
> |-- value1: struct (nullable = false)
> | |-- col1: string (nullable = false)
> | |-- col2: string (nullable = false)
> d5_opp.printSchema
> root
> |-- key: integer (nullable = false)
> |-- value: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- col2: string (nullable = true)
> | | |-- col1: string (nullable = true)
> |-- value1: struct (nullable = false)
> | |-- col2: string (nullable = false)
> | |-- col1: string (nullable = false)
> The below join statement do not work :
> d5.as("d5").join( d5_opp.as("d5_opp"), $"d5.value" === $"d5_opp.value",
> "inner").show
> Exception raised is :
> org.apache.spark.sql.AnalysisException: cannot resolve '(value = value)' due
> to data type mismatch: differing types in '(value = value)'
> (array<struct<col1:string,col2:string>> and
> array<struct<col2:string,col1:string>>).;
> d5.as("d5").join( d5_opp.as("d5_opp"), $"d5.value1" === $"d5_opp.value1",
> "inner").show
> Exception raised is :
> org.apache.spark.sql.AnalysisException: cannot resolve '(value1 = value1)'
> due to data type mismatch: differing types in '(value1 = value1)'
> (struct<col1:string,col2:string> and struct<col2:string,col1:string>).;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]