nuno-faria commented on code in PR #21720:
URL: https://github.com/apache/datafusion/pull/21720#discussion_r3106602830


##########
datafusion/spark/src/function/map/utils.rs:
##########
@@ -202,17 +202,20 @@ fn map_deduplicate_keys(
                         cur_keys_offset + cur_entry_idx,
                     )?
                     .compacted();
+                    // Enforce Spark's default 
`spark.sql.mapKeyDedupPolicy=EXCEPTION`.
+                    // Native LAST_WIN support is deferred to a follow-up.
                     if seen_keys.contains(&key) {
-                        // TODO: implement configuration and logic for 
spark.sql.mapKeyDedupPolicy=EXCEPTION (this is default spark-config)
-                        // exec_err!("invalid argument: duplicate keys in map")
-                        // 
https://github.com/apache/spark/blob/cf3a34e19dfcf70e2d679217ff1ba21302212472/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L4961
-                    } else {
-                        // This code implements deduplication logic for 
spark.sql.mapKeyDedupPolicy=LAST_WIN (this is NOT default spark-config)
-                        keys_mask_one[cur_entry_idx] = true;
-                        values_mask_one[cur_entry_idx] = true;
-                        seen_keys.insert(key);
-                        new_last_offset += 1;
+                        return exec_err!(
+                            "[DUPLICATED_MAP_KEY] Duplicate map key {key} was 
found, \
+                             please check the input data. If you want to 
remove the \
+                             duplicated keys, you can set 
spark.sql.mapKeyDedupPolicy \

Review Comment:
   This error message points to the `spark.sql.mapKeyDedupPolicy` config, which 
is Spark specific and does not exist in DataFusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to