brancz commented on PR #6873:
URL: https://github.com/apache/arrow-rs/pull/6873#issuecomment-2543146149

   Ok I also took the time to analyze every single allow deprecate in tests and 
tl;dr is there are only 2 tests that actually test the dictionary ID ending up 
on the wire, but as a side effect, they actually want to test something else.
   
   <details>
   <summary>Here is my full analysis:</summary>
   
   tests in `arrow-flight/src/encode.rs`:
   * `test_schema_metadata_encoded`: doesn't care about 
new_with_preserve_dict_id, can be changed to new
   * `flight_data_from_arrow_batch` uses default already
   
   test in `arrow-integration-test/src/schema.rs`: just marshals whatever the 
field is, it doesn't care about the dict_id (simply happens to use new_dict)
   test in 
`arrow-integration-testing/src/flight_server_scenarios/integration_test.rs`: 
uses new_with_preserve_dict_id, can be changed to regular `new`
   test in `arrow-ipc/src/convert.rs`: just tests roundtripping, doesn't care 
about dict_id (simply happens to use new_dict)
   tests in `arrow-ipc/src/reader.rs`: 
   * `test_roundtrip_nested_dict_no_preserve_dict_id`: uses 
new_with_preserve_dict_id, can be changed to regular `new`
   * `test_roundtrip_stream_nested_dict_of_map_of_dict`: doesn't care about 
dict_id (simply happens to use new_dict)
   * `test_roundtrip_stream_dict_of_list_of_dict`: doesn't care about dict_id 
(simply happens to use new_dict)
   * `test_roundtrip_stream_dict_of_fixed_size_list_of_dict`: doesn't care 
about dict_id (simply happens to use new_dict)
   * `test_roundtrip_view_types_nested_dict`: doesn't care about dict_id 
(simply happens to use new_dict)
   * `test_unaligned`: uses new_with_preserve_dict_id, can be changed to 
regular `new`
   * `test_unaligned_throws_error_with_require_alignment`: uses 
new_with_preserve_dict_id, can be changed to regular `new`
   * `test_same_dict_id_without_preserve`: uses new_with_preserve_dict_id, can 
be changed to regular `new`, but also uses `new_dict`
   * `test_read_ree_dict_record_batches_from_buffer`: doesn't care about 
`dict_id` and IpcWriteOptions can be changed to default
   * `track_union_nested_dict`, `track_struct_nested_dict`: actually tests 
dict_id to be set on the wire (but not sure why it doesn't need to for the 
purpose of the test)
   * `test_dictionary_ordered`: uses `new_dict` but doesn't care about its value
   
   tests in `arrow-schema/src/field.rs`:
   * need to look into uses of `test_new_dict_with_string` and 
`schema_field_with_dict_id`, `person_schema`
   * `test_fields_with_dict_id`, `test_field_comparison_case`, 
`schema_field_accessors`: specifically tests the existance of the 
functions/fields we want to remove (not the functionality)
   
   tests in `parquet/src/arrow/arrow_writer/mod.rs`:
   * `arrow_writer_string_dictionary`, `arrow_writer_primitive_dictionary`, 
`arrow_writer_string_dictionary_unsigned_index`: just tests roundtripping, 
doesn't care about dict_id (simply happens to use new_dict)
   
   tests in `parquet/src/arrow/schema/mod.rs`:
   * `test_arrow_schema_roundtrip`: doesn't care about new_dict, simply calls 
it.
   </details>
   
   Originally coming from Arrow's implementation in Go I can also attest that 
this is the only library around that I've found to be exposing this and not 
just making it a detail of the wire format.
   
   After going through this in detail, I conclude these deprecations are safe 
to do, and we should follow up with testing dictionary equality (in the now 
default mode) to actually set dictionary IDs equal for equal dictionaries, 
however, that doesn't influence the APIs, so in my opinion we can go ahead with 
this as is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to