alamb commented on PR #11361: URL: https://github.com/apache/datafusion/pull/11361#issuecomment-2221111658
> I noticed the map and struct of Arrow allow duplicate keys but I think this behavior is wrong. I did some research For `StructArray`s https://arrow.apache.org/docs/format/Columnar.html#struct-layout doesn't say anything about requiring the field names to be unique. It certainly is going to cause issues for many applications (including datafusion) if there is a struct with duplicated field names For map, the spec says the keys should be unique (and application, aka DataFusion) enforced https://github.com/apache/arrow/blob/5e451d85d7269d3fb9c7eaab06caece5718c40e5/format/Schema.fbs#L117-L145 > /// In this layout, the keys and values are each respectively contiguous. We do > /// not constrain the key and value types, so the application is responsible > /// for ensuring that the keys are hashable and unique. Whether the keys are sorted > /// may be set in the metadata for this field. So my suggestion is: 1. File a separate datafusion ticket about the fact that `named_struct` allows repeated field names 2. either file a new ticket (or Update this PR) to enforce the uniqueness of map key name for the `map`/`make_map` function -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org