brancz commented on PR #6873: URL: https://github.com/apache/arrow-rs/pull/6873#issuecomment-2543146149
Ok I also took the time to analyze every single allow deprecate in tests and tl;dr is there are only 2 tests that actually test the dictionary ID ending up on the wire, but as a side effect, they actually want to test something else. <details> <summary>Here is my full analysis:</summary> tests in `arrow-flight/src/encode.rs`: * `test_schema_metadata_encoded`: doesn't care about new_with_preserve_dict_id, can be changed to new * `flight_data_from_arrow_batch` uses default already test in `arrow-integration-test/src/schema.rs`: just marshals whatever the field is, it doesn't care about the dict_id (simply happens to use new_dict) test in `arrow-integration-testing/src/flight_server_scenarios/integration_test.rs`: uses new_with_preserve_dict_id, can be changed to regular `new` test in `arrow-ipc/src/convert.rs`: just tests roundtripping, doesn't care about dict_id (simply happens to use new_dict) tests in `arrow-ipc/src/reader.rs`: * `test_roundtrip_nested_dict_no_preserve_dict_id`: uses new_with_preserve_dict_id, can be changed to regular `new` * `test_roundtrip_stream_nested_dict_of_map_of_dict`: doesn't care about dict_id (simply happens to use new_dict) * `test_roundtrip_stream_dict_of_list_of_dict`: doesn't care about dict_id (simply happens to use new_dict) * `test_roundtrip_stream_dict_of_fixed_size_list_of_dict`: doesn't care about dict_id (simply happens to use new_dict) * `test_roundtrip_view_types_nested_dict`: doesn't care about dict_id (simply happens to use new_dict) * `test_unaligned`: uses new_with_preserve_dict_id, can be changed to regular `new` * `test_unaligned_throws_error_with_require_alignment`: uses new_with_preserve_dict_id, can be changed to regular `new` * `test_same_dict_id_without_preserve`: uses new_with_preserve_dict_id, can be changed to regular `new`, but also uses `new_dict` * `test_read_ree_dict_record_batches_from_buffer`: doesn't care about `dict_id` and IpcWriteOptions can be changed to default * `track_union_nested_dict`, `track_struct_nested_dict`: actually tests dict_id to be set on the wire (but not sure why it doesn't need to for the purpose of the test) * `test_dictionary_ordered`: uses `new_dict` but doesn't care about its value tests in `arrow-schema/src/field.rs`: * need to look into uses of `test_new_dict_with_string` and `schema_field_with_dict_id`, `person_schema` * `test_fields_with_dict_id`, `test_field_comparison_case`, `schema_field_accessors`: specifically tests the existance of the functions/fields we want to remove (not the functionality) tests in `parquet/src/arrow/arrow_writer/mod.rs`: * `arrow_writer_string_dictionary`, `arrow_writer_primitive_dictionary`, `arrow_writer_string_dictionary_unsigned_index`: just tests roundtripping, doesn't care about dict_id (simply happens to use new_dict) tests in `parquet/src/arrow/schema/mod.rs`: * `test_arrow_schema_roundtrip`: doesn't care about new_dict, simply calls it. </details> Originally coming from Arrow's implementation in Go I can also attest that this is the only library around that I've found to be exposing this and not just making it a detail of the wire format. After going through this in detail, I conclude these deprecations are safe to do, and we should follow up with testing dictionary equality (in the now default mode) to actually set dictionary IDs equal for equal dictionaries, however, that doesn't influence the APIs, so in my opinion we can go ahead with this as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org