goldmedal commented on code in PR #11547: URL: https://github.com/apache/datafusion/pull/11547#discussion_r1688090369
########## datafusion/substrait/src/logical_plan/consumer.rs: ########## @@ -1785,6 +1798,98 @@ fn from_substrait_literal( } } } + Some(LiteralType::Map(m)) => { + // Each entry should start the name index from the same value, then we increase it + // once at the end + let mut entry_name_idx = *name_idx; + let entries = m + .key_values + .iter() + .map(|kv| { + entry_name_idx = *name_idx; + let key_sv = from_substrait_literal( + kv.key.as_ref().unwrap(), + extensions, + dfs_names, + &mut entry_name_idx, + )?; + let value_sv = from_substrait_literal( + kv.value.as_ref().unwrap(), + extensions, + dfs_names, + &mut entry_name_idx, + )?; + ScalarStructBuilder::new() Review Comment: > this was the most high-level way of creating the map I could think of, lmk if you have better ideas! It looks good to me. If you intend to be more efficient, I suggest referring to how `make_map_batch_internal` creates a `MapArray`. https://github.com/apache/datafusion/blob/77311a5896272c7ed252d8cd53d48ec6ea7c0ccf/datafusion/functions-array/src/map.rs#L80 You need to partition the key and value pairs into two arrays, and build the `MapArray` based on them. Maybe you can refer to `plan_make_map`. https://github.com/apache/datafusion/blob/77311a5896272c7ed252d8cd53d48ec6ea7c0ccf/datafusion/functions-array/src/planner.rs#L104 However, I think it just some improvements. We don't need to do that in this PR. ########## datafusion/common/src/hash_utils.rs: ########## @@ -692,6 +732,64 @@ mod tests { assert_eq!(hashes[0], hashes[1]); } + #[test] + // Tests actual values of hashes, which are different if forcing collisions + #[cfg(not(feature = "force_hash_collisions"))] + fn create_hashes_for_map_arrays() { + let mut builder = + MapBuilder::new(None, StringBuilder::new(), Int32Builder::new()); + // Row 0 + builder.keys().append_value("key1"); + builder.keys().append_value("key2"); + builder.values().append_value(10); + builder.values().append_value(11); + builder.append(true).unwrap(); + // Row 1 + builder.keys().append_value("key1"); + builder.keys().append_value("key2"); + builder.values().append_value(10); + builder.values().append_value(11); + builder.append(true).unwrap(); + // Row 2 + builder.keys().append_value("key1"); + builder.keys().append_value("key2"); + builder.values().append_value(10); + builder.values().append_value(12); + builder.append(true).unwrap(); + // Row 3 + builder.keys().append_value("key1"); + builder.keys().append_value("key3"); + builder.values().append_value(10); + builder.values().append_value(11); + builder.append(true).unwrap(); + // Row 4 + builder.keys().append_value("key1"); + builder.values().append_value(10); + builder.append(true).unwrap(); + // Row 5 + builder.keys().append_value("key1"); + builder.values().append_null(); + builder.append(true).unwrap(); + // Row 6 + builder.append(true).unwrap(); + // Row 7 + builder.keys().append_value("key1"); + builder.values().append_value(10); + builder.append(false).unwrap(); + + let array = Arc::new(builder.finish()) as ArrayRef; + + let random_state = RandomState::with_seeds(0, 0, 0, 0); + let mut hashes = vec![0; array.len()]; + create_hashes(&[array], &random_state, &mut hashes).unwrap(); + assert_eq!(hashes[0], hashes[1]); // same value + assert_ne!(hashes[0], hashes[2]); // different key + assert_ne!(hashes[0], hashes[3]); // different value Review Comment: ```suggestion assert_ne!(hashes[0], hashes[2]); // different value assert_ne!(hashes[0], hashes[3]); // different key ``` I guess the comments are wrong. The difference between `Row 0` and `Row 2` is the value of `key2`: `{'key2': 11}` in Row 0 and `{'key2': 12}` in Row 2. However, the difference between `Row 0` and `Row 3` is the key: `key2` in Row 0 and `key3` in Row 3. Did I say that correctly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org