[ 
https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663553#comment-16663553
 ] 

Marco Gaido commented on SPARK-25829:
-------------------------------------

I think the main issue is that since this is not a SQL standard thing, every DB 
works in its way. Eg. Postgres just says that when duplicate keys are entered, 
there is no guarantee on the result 
(https://www.postgresql.org/docs/9.0/static/hstore.html); not a great policy, I 
agree. Maybe we can check Hive, since Spark takes much of its behavior from it. 
Anyway, I think we just need to define a coherent behavior across the codebase.

One consideration is that enforcing a policy like Presto (eg. fail in such a 
situation) has 2 main drawbacks:
 - We usually don't fail with bad data (most of the times we return NULL 
instead of throwing exceptions in other situations);
 - Checking if a key is already present, with the current {{ArrayData}} 
representation, is very inefficient and we can do workarounds for this, but we 
would need to replicate workarounds in any function which can produce keys, so 
it is going to be problematic to maintain.


> Duplicated map keys are not handled consistently
> ------------------------------------------------
>
>                 Key: SPARK-25829
>                 URL: https://issues.apache.org/jira/browse/SPARK-25829
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Wenchen Fan
>            Priority: Major
>
> In Spark SQL, we apply "earlier entry wins" semantic to duplicated map keys. 
> e.g.
> {code}
> scala> sql("SELECT map(1,2,1,3)[1]").show
> +------------------+
> |map(1, 2, 1, 3)[1]|
> +------------------+
> |                 2|
> +------------------+
> {code}
> However, this handling is not applied consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to