Shawn Lavelle created SPARK-19731:
-------------------------------------

             Summary: IN Operator should support arrays
                 Key: SPARK-19731
                 URL: https://issues.apache.org/jira/browse/SPARK-19731
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.1.0, 2.0.0, 1.6.2
            Reporter: Shawn Lavelle
            Priority: Minor


When the column type and array member type match, the IN operator should still 
operate on the array. This is useful for UDFs and Predicate SubQueries that 
return arrays.  

(This isn't necessarily extensible to all collections, but certainly applies to 
arrays.)

Example:
select 5 in array(1,2,3) Should return false instead of parseException, since 
the type of the array and the type of the column match.

create table test (val int);
insert into test values (1);
select * from test;
+------+--+
| val  |
+------+--+
| 1    |
+------+--+
*select val from test where array_contains(array(1,2,3), val);*
+------+--+
| val  |
+------+--+
| 1    |
+------+--+

{panel}
*select val from test where val in (array(1,2,3));*
Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` IN 
(array(1, 2, 3)))' due to data type mismatch: Arguments must be same type; line 
1 pos 31;
'Project ['val]
+- 'Filter val#433 IN (array(1, 2, 3))
   +- MetastoreRelation test (state=,code=0)
{panel}

{panel}
*select val from test where val in (select array(1,2,3));*
Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` = 
`array(1, 2, 3)`)' due to data type mismatch: differing types in '(test.`val` = 
`array(1, 2, 3)`)' (int and array<int>).;;
'Project ['val]
+- 'Filter predicate-subquery#434 [(val#435 = array(1, 2, 3)#436)]
   :  +- Project [array(1, 2, 3) AS array(1, 2, 3)#436]
   :     +- OneRowRelation$
   +- MetastoreRelation test (state=,code=0)
{panel}
{panel}
*select val from test where val in (select explode(array(1,2,3)));*
+------+--+
| val  |
+------+--+
| 1    |
+------+--+

Note: See [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] for 
how a predicate subquery breaks when applied to the DataSourceAPI
{panel}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to