rluvaton opened a new issue, #9475:
URL: https://github.com/apache/arrow-rs/issues/9475

   **Describe the bug**
   According to the arrow specification, the map keys should be unique
   > [..] for ensuring that the keys are hashable and unique
   > 
https://github.com/apache/arrow/blob/cbe2618431e413f12aa16aeba88b3a98914f194b/format/Schema.fbs#L124
   
   but there is no validation in the `MapArray` that the keys are indeed unique
   
   **To Reproduce**
   ```rust
   #[test]
   fn should_fail_to_create_map_with_duplicate_keys() {
       let struct_fields = Fields::from(vec![
           Field::new("keys", DataType::Int32, true),
           Field::new("values", DataType::Utf8, true)
       ]);
       let map_array = MapArray::try_new(
           Arc::new(Field::new(
               "entries",
               DataType::Struct(
                   struct_fields.clone()
               ),
               true
           )),
           OffsetBuffer::<i32>::from_lengths(std::iter::once(2)),
           StructArray::new(
               struct_fields,
               vec![
                   Arc::new(Int32Array::from(vec![1, 1])) as ArrayRef,
                   Arc::new(StringArray::from(vec!["hello", "world"])) as 
ArrayRef,
               ],
               None
           ),
           None,
           false
       ).expect_err("should fail to create map with duplicate keys");
   }
   ```
   
   **Expected behavior**
   The creation should fail
   
   **Additional context**
   I know that this is expensive and hard to do since the crate can't depend on 
other crates really, but you should not be able to create array that do not 
match the spec using safe functions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to