[ 
https://issues.apache.org/jira/browse/SPARK-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662646#comment-16662646
 ] 

Wenchen Fan commented on SPARK-25823:
-------------------------------------

[~dongjoon] good catch! I think we should update collect to match the behavior 
of map lookup.

Going back to this ticket, the current behavior is different from presto but is 
consistent with how map type behaves in Spark. If others think this is serious, 
I'd suggest we remove map-related high-order functions from 2.4. However we 
can't remove `CreateMap`, so the behavior of map type in Spark is still as it 
was.

Personally I don't want to remove the map-related high-order functions, as they 
follow the map type semantic in Spark and are implemented correctly. The only 
benefit I can think of is to not spread the unexpected behavior of map type in 
Spark.

In the master branch we can work on making Spark map type consistent with 
Presto.

> map_filter can generate incorrect data
> --------------------------------------
>
>                 Key: SPARK-25823
>                 URL: https://issues.apache.org/jira/browse/SPARK-25823
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>              Labels: correctness
>
> This is not a regression because this occurs in new high-order functions like 
> `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows 
> the duplication. If we want to allow this difference in new high-order 
> functions, we had better add some warning about this different on these 
> functions after RC4 voting pass at least. Otherwise, this will surprise 
> Presto-based users.
> *Spark 2.4*
> {code:java}
> spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM 
> (SELECT map_concat(map(1,2), map(1,3)) m);
> spark-sql> SELECT * FROM t;
> {1:3} {1:2}
> {code}
> *Presto 0.212*
> {code:java}
> presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT 
> map_concat(map(array[1],array[2]), map(array[1],array[3])) a);
>    a   | _col1
> -------+-------
>  {1=3} | {}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to