coderfender commented on code in PR #21720:
URL: https://github.com/apache/datafusion/pull/21720#discussion_r3107149796
##########
datafusion/spark/src/function/map/utils.rs:
##########
@@ -202,17 +202,20 @@ fn map_deduplicate_keys(
cur_keys_offset + cur_entry_idx,
)?
.compacted();
+ // Enforce Spark's default
`spark.sql.mapKeyDedupPolicy=EXCEPTION`.
+ // Native LAST_WIN support is deferred to a follow-up.
if seen_keys.contains(&key) {
- // TODO: implement configuration and logic for
spark.sql.mapKeyDedupPolicy=EXCEPTION (this is default spark-config)
- // exec_err!("invalid argument: duplicate keys in map")
- //
https://github.com/apache/spark/blob/cf3a34e19dfcf70e2d679217ff1ba21302212472/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L4961
- } else {
- // This code implements deduplication logic for
spark.sql.mapKeyDedupPolicy=LAST_WIN (this is NOT default spark-config)
- keys_mask_one[cur_entry_idx] = true;
- values_mask_one[cur_entry_idx] = true;
- seen_keys.insert(key);
- new_last_offset += 1;
+ return exec_err!(
+ "[DUPLICATED_MAP_KEY] Duplicate map key {key} was
found, \
+ please check the input data. If you want to
remove the \
+ duplicated keys, you can set
spark.sql.mapKeyDedupPolicy \
Review Comment:
We might want to keep the error message but might make the error more geared
towards DF
##########
datafusion/sqllogictest/test_files/spark/map/map_from_entries.slt:
##########
@@ -151,14 +151,12 @@ SELECT
----
{outer_key1: {inner_a: 1, inner_b: 2}, outer_key2: {inner_x: 10, inner_y: 20,
inner_z: 30}}
-# Test with duplicate keys
-query ?
+# Test with duplicate keys: raises DUPLICATED_MAP_KEY under Spark's default
policy
+query error DataFusion error: Execution error: \[DUPLICATED_MAP_KEY\]
Duplicate map key true was found
SELECT map_from_entries(array(
- struct(true, 'a'),
- struct(false, 'b'),
+ struct(true, 'a'),
+ struct(false, 'b'),
struct(true, 'c'),
- struct(false, cast(NULL as string)),
+ struct(false, cast(NULL as string)),
struct(true, 'd')
Review Comment:
Might want to revert unwanted formatting changes here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]