nuno-faria commented on code in PR #21720:
URL: https://github.com/apache/datafusion/pull/21720#discussion_r3106602830
##########
datafusion/spark/src/function/map/utils.rs:
##########
@@ -202,17 +202,20 @@ fn map_deduplicate_keys(
cur_keys_offset + cur_entry_idx,
)?
.compacted();
+ // Enforce Spark's default
`spark.sql.mapKeyDedupPolicy=EXCEPTION`.
+ // Native LAST_WIN support is deferred to a follow-up.
if seen_keys.contains(&key) {
- // TODO: implement configuration and logic for
spark.sql.mapKeyDedupPolicy=EXCEPTION (this is default spark-config)
- // exec_err!("invalid argument: duplicate keys in map")
- //
https://github.com/apache/spark/blob/cf3a34e19dfcf70e2d679217ff1ba21302212472/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L4961
- } else {
- // This code implements deduplication logic for
spark.sql.mapKeyDedupPolicy=LAST_WIN (this is NOT default spark-config)
- keys_mask_one[cur_entry_idx] = true;
- values_mask_one[cur_entry_idx] = true;
- seen_keys.insert(key);
- new_last_offset += 1;
+ return exec_err!(
+ "[DUPLICATED_MAP_KEY] Duplicate map key {key} was
found, \
+ please check the input data. If you want to
remove the \
+ duplicated keys, you can set
spark.sql.mapKeyDedupPolicy \
Review Comment:
This error message points to the `spark.sql.mapKeyDedupPolicy` config, which
is Spark specific and does not exist in DataFusion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]