[ https://issues.apache.org/jira/browse/SPARK-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662618#comment-16662618 ]
Dongjoon Hyun commented on SPARK-25823: --------------------------------------- Right. This is a long lasting issue when CreateMap added. And, when we do collect, the last entry wins. {code} scala> sql("SELECT map(1,2,1,3)").collect res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) {code} > map_filter can generate incorrect data > -------------------------------------- > > Key: SPARK-25823 > URL: https://issues.apache.org/jira/browse/SPARK-25823 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dongjoon Hyun > Priority: Blocker > Labels: correctness > > This is not a regression because this occurs in new high-order functions like > `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows > the duplication. If we want to allow this difference in new high-order > functions, we had better add some warning about this different on these > functions after RC4 voting pass at least. Otherwise, this will surprise > Presto-based users. > *Spark 2.4* > {code:java} > spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM > (SELECT map_concat(map(1,2), map(1,3)) m); > spark-sql> SELECT * FROM t; > {1:3} {1:2} > {code} > *Presto 0.212* > {code:java} > presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT > map_concat(map(array[1],array[2]), map(array[1],array[3])) a); > a | _col1 > -------+------- > {1=3} | {} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org