[ 
https://issues.apache.org/jira/browse/SPARK-19731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884763#comment-15884763
 ] 

Herman van Hovell commented on SPARK-19731:
-------------------------------------------

Why is array_contains not sufficient?

{{IN}} by definition works with a list of expressions or a subquery. Adding 
support for array might introduce some interesting ambiguities.

> IN Operator should support arrays
> ---------------------------------
>
>                 Key: SPARK-19731
>                 URL: https://issues.apache.org/jira/browse/SPARK-19731
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0, 2.1.0
>            Reporter: Shawn Lavelle
>            Priority: Minor
>
> When the column type and array member type match, the IN operator should 
> still operate on the array. This is useful for UDFs and Predicate SubQueries 
> that return arrays.  
> (This isn't necessarily extensible to all collections, but certainly applies 
> to arrays.)
> Example:
> select 5 in array(1,2,3) Should return false instead of parseException, since 
> the type of the array and the type of the column match.
> create table test (val int);
> insert into test values (1);
> select * from test;
> +------+--+
> | val  |
> +------+--+
> | 1    |
> +------+--+
> *select val from test where array_contains(array(1,2,3), val);*
> +------+--+
> | val  |
> +------+--+
> | 1    |
> +------+--+
> {panel}
> *select val from test where val in (array(1,2,3));*
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` IN 
> (array(1, 2, 3)))' due to data type mismatch: Arguments must be same type; 
> line 1 pos 31;
> 'Project ['val]
> +- 'Filter val#433 IN (array(1, 2, 3))
>    +- MetastoreRelation test (state=,code=0)
> {panel}
> {panel}
> *select val from test where val in (select array(1,2,3));*
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` = 
> `array(1, 2, 3)`)' due to data type mismatch: differing types in '(test.`val` 
> = `array(1, 2, 3)`)' (int and array<int>).;;
> 'Project ['val]
> +- 'Filter predicate-subquery#434 [(val#435 = array(1, 2, 3)#436)]
>    :  +- Project [array(1, 2, 3) AS array(1, 2, 3)#436]
>    :     +- OneRowRelation$
>    +- MetastoreRelation test (state=,code=0)
> {panel}
> {panel}
> *select val from test where val in (select explode(array(1,2,3)));*
> +------+--+
> | val  |
> +------+--+
> | 1    |
> +------+--+
> Note: See [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] for 
> how a predicate subquery breaks when applied to the DataSourceAPI
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to