[ https://issues.apache.org/jira/browse/SPARK-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662646#comment-16662646 ]
Wenchen Fan commented on SPARK-25823: ------------------------------------- [~dongjoon] good catch! I think we should update collect to match the behavior of map lookup. Going back to this ticket, the current behavior is different from presto but is consistent with how map type behaves in Spark. If others think this is serious, I'd suggest we remove map-related high-order functions from 2.4. However we can't remove `CreateMap`, so the behavior of map type in Spark is still as it was. Personally I don't want to remove the map-related high-order functions, as they follow the map type semantic in Spark and are implemented correctly. The only benefit I can think of is to not spread the unexpected behavior of map type in Spark. In the master branch we can work on making Spark map type consistent with Presto. > map_filter can generate incorrect data > -------------------------------------- > > Key: SPARK-25823 > URL: https://issues.apache.org/jira/browse/SPARK-25823 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dongjoon Hyun > Priority: Blocker > Labels: correctness > > This is not a regression because this occurs in new high-order functions like > `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows > the duplication. If we want to allow this difference in new high-order > functions, we had better add some warning about this different on these > functions after RC4 voting pass at least. Otherwise, this will surprise > Presto-based users. > *Spark 2.4* > {code:java} > spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM > (SELECT map_concat(map(1,2), map(1,3)) m); > spark-sql> SELECT * FROM t; > {1:3} {1:2} > {code} > *Presto 0.212* > {code:java} > presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT > map_concat(map(array[1],array[2]), map(array[1],array[3])) a); > a | _col1 > -------+------- > {1=3} | {} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org