alamb commented on PR #11361:
URL: https://github.com/apache/datafusion/pull/11361#issuecomment-2221111658

   > I noticed the map and struct of Arrow allow duplicate keys but I think 
this behavior is wrong.
   
   I did some research
   
   For `StructArray`s  
https://arrow.apache.org/docs/format/Columnar.html#struct-layout doesn't say 
anything about requiring the field names to be unique. It certainly is going to 
cause issues for many applications (including datafusion) if there is a struct 
with duplicated field names
   
   For map, the spec says the keys should be unique (and application, aka 
DataFusion) enforced
   
   
https://github.com/apache/arrow/blob/5e451d85d7269d3fb9c7eaab06caece5718c40e5/format/Schema.fbs#L117-L145
   
   > /// In this layout, the keys and values are each respectively contiguous. 
We do
   > /// not constrain the key and value types, so the application is 
responsible
   > /// for ensuring that the keys are hashable and unique. Whether the keys 
are sorted
   > /// may be set in the metadata for this field.
   
   
   
   So my suggestion is:
   1. File a separate datafusion ticket about the fact that `named_struct` 
allows repeated field names 
   2. either file a new ticket (or Update this PR) to enforce the uniqueness of 
map key name for the `map`/`make_map` function
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to