viirya opened a new issue, #1646:
URL: https://github.com/apache/arrow-rs/issues/1646

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   This issue is discovered when I'm debugging 
`generate_nested_dictionary_case` integration failure on C++/Rust integration 
cases. The arrow file from C++ has a schema which has only difference dict_id 
than the schema read from Json file at Rust side. By excluding dict_id from 
equality comparison of `Field`, the schema and record batches are exactly the 
same.
   
   Based on C++ 
[implementation](https://github.com/apache/arrow/blob/942f77e5c52412694cb78cd4eca96d559475906e/cpp/src/arrow/type.cc)
 of `Field`, it doesn't contain dictionary related properties like `dict_id`, 
so its equality comparison doesn't compare it.
   
   In Arrow 
[spec](https://arrow.apache.org/docs/format/Columnar.html#dictionary-encoded-layout),
 I don't see `id` is specified in dictionary encoded layout, but only mentioned 
in dictionary message in IPC. It is used basically to know where a dictionary 
is used in the schema. So it seems to me,  the `dict_id` isn't necessary to be 
used in equality comparison of `Field`. It's only required to be consistent 
across dictionary encoded data and schema (so we can match correct dictionary 
to correct field).
    
   **Describe the solution you'd like**
   Exclude `dict_id` and `dict_is_ordered` from equality comparison of `Field`.
   
   **Describe alternatives you've considered**
   
   
   **Additional context**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to