[
https://issues.apache.org/jira/browse/SPARK-19731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884763#comment-15884763
]
Herman van Hovell commented on SPARK-19731:
-------------------------------------------
Why is array_contains not sufficient?
{{IN}} by definition works with a list of expressions or a subquery. Adding
support for array might introduce some interesting ambiguities.
> IN Operator should support arrays
> ---------------------------------
>
> Key: SPARK-19731
> URL: https://issues.apache.org/jira/browse/SPARK-19731
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.6.2, 2.0.0, 2.1.0
> Reporter: Shawn Lavelle
> Priority: Minor
>
> When the column type and array member type match, the IN operator should
> still operate on the array. This is useful for UDFs and Predicate SubQueries
> that return arrays.
> (This isn't necessarily extensible to all collections, but certainly applies
> to arrays.)
> Example:
> select 5 in array(1,2,3) Should return false instead of parseException, since
> the type of the array and the type of the column match.
> create table test (val int);
> insert into test values (1);
> select * from test;
> +------+--+
> | val |
> +------+--+
> | 1 |
> +------+--+
> *select val from test where array_contains(array(1,2,3), val);*
> +------+--+
> | val |
> +------+--+
> | 1 |
> +------+--+
> {panel}
> *select val from test where val in (array(1,2,3));*
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` IN
> (array(1, 2, 3)))' due to data type mismatch: Arguments must be same type;
> line 1 pos 31;
> 'Project ['val]
> +- 'Filter val#433 IN (array(1, 2, 3))
> +- MetastoreRelation test (state=,code=0)
> {panel}
> {panel}
> *select val from test where val in (select array(1,2,3));*
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` =
> `array(1, 2, 3)`)' due to data type mismatch: differing types in '(test.`val`
> = `array(1, 2, 3)`)' (int and array<int>).;;
> 'Project ['val]
> +- 'Filter predicate-subquery#434 [(val#435 = array(1, 2, 3)#436)]
> : +- Project [array(1, 2, 3) AS array(1, 2, 3)#436]
> : +- OneRowRelation$
> +- MetastoreRelation test (state=,code=0)
> {panel}
> {panel}
> *select val from test where val in (select explode(array(1,2,3)));*
> +------+--+
> | val |
> +------+--+
> | 1 |
> +------+--+
> Note: See [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] for
> how a predicate subquery breaks when applied to the DataSourceAPI
> {panel}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]