[
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Raducanu updated SPARK-20744:
------------------------------------
Description:
The following code reproduces the problem:
{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`,
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch:
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
+- Range (0, 10, step=1, splits=Some(1))
{code}
Similarly it won't work from SQL either, which is something that other SQL DB
support:
{code}
scala> spark.range(10).selectExpr("id as a", "id as
b").createOrReplaceTempView("tab1")
scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1),
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2,
1),named_struct(col1, 2, col2, 2))
+- SubqueryAlias tab1
+- Project [id#47L AS a#50L, id#47L AS b#51L]
+- Range (0, 10, step=1, splits=Some(1))
{code}
Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`)
= named_struct('col1', 1, 'col2', 1))' (struct<a:bigint,b:bigint> and
struct<col1:int,col2:int>).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
+- SubqueryAlias tab1
+- Project [id#47L AS a#50L, id#47L AS b#51L]
+- Range (0, 10, step=1, splits=Some(1))
{code}
Expressions such as (1,1) are apparently read as structs and then the types do
not match. Perhaps they should be arrays.
The following code works:
{code}
sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
{code}
This also works, but requires the cast:
{code}
sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint),
'b', cast(1 as bigint)))").show
{code}
was:
The following code reproduces the problem:
{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`,
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch:
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
+- Range (0, 10, step=1, splits=Some(1))
{code}
Similarly it won't work from SQL either, which is something that other SQL DB
support:
{code}
scala> spark.range(10).selectExpr("id as a", "id as
b").createOrReplaceTempView("tab1")
scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1),
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2,
1),named_struct(col1, 2, col2, 2))
+- SubqueryAlias tab1
+- Project [id#47L AS a#50L, id#47L AS b#51L]
+- Range (0, 10, step=1, splits=Some(1))
{code}
Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`)
= named_struct('col1', 1, 'col2', 1))' (struct<a:bigint,b:bigint> and
struct<col1:int,col2:int>).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
+- SubqueryAlias tab1
+- Project [id#47L AS a#50L, id#47L AS b#51L]
+- Range (0, 10, step=1, splits=Some(1))
{code}
Expressions such as (1,1) are apparently read as structs and then the types do
not match. Perhaps they should be arrays.
The following code works:
{code}
sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
{code}
> Predicates with multiple columns do not work
> --------------------------------------------
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
> +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1),
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2,
> 1),named_struct(col1, 2, col2, 2))
> +- SubqueryAlias tab1
> +- Project [id#47L AS a#50L, id#47L AS b#51L]
> +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Other examples:
> {code}
> scala> sql("select * from tab1 where (a,b) =(1,1)").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a',
> tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data
> type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b',
> tab1.`b`) = named_struct('col1', 1, 'col2', 1))' (struct<a:bigint,b:bigint>
> and struct<col1:int,col2:int>).; line 1 pos 25;
> 'Project [*]
> +- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
> +- SubqueryAlias tab1
> +- Project [id#47L AS a#50L, id#47L AS b#51L]
> +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Expressions such as (1,1) are apparently read as structs and then the types
> do not match. Perhaps they should be arrays.
> The following code works:
> {code}
> sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
> {code}
> This also works, but requires the cast:
> {code}
> sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint),
> 'b', cast(1 as bigint)))").show
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]