JakeDern commented on code in PR #8001: URL: https://github.com/apache/arrow-rs/pull/8001#discussion_r2264209301
########## arrow-ipc/src/writer.rs: ########## @@ -760,34 +812,108 @@ impl DictionaryTracker { /// * If the tracker has not been configured to error on replacement or this dictionary /// has never been seen before, return `Ok(true)` to indicate that the dictionary was just /// inserted. - pub fn insert(&mut self, dict_id: i64, column: &ArrayRef) -> Result<bool, ArrowError> { - let dict_data = column.to_data(); - let dict_values = &dict_data.child_data()[0]; - - // If a dictionary with this id was already emitted, check if it was the same. - if let Some(last) = self.written.get(&dict_id) { - if ArrayData::ptr_eq(&last.child_data()[0], dict_values) { - // Same dictionary values => no need to emit it again - return Ok(false); + pub fn insert( Review Comment: On second thought, I don't think my idea is good because the handling definitely belongs on the `IpcWriteOptions` and often times those options are passed around along with the dictionary tracker. I think it would be confusing if it were possible to set one `DictionaryHandling` type on the tracker and another on the write options and not really know which is used without reading the code. Maybe in a future release we can take a look at the overall API and how the responsibilities of the dictionary tracker and the various writers intersect. I've taken your suggestion to pass in the DictionaryHandling directly instead of a boolean and also add a new method while deprecating the old to get this into a sooner release if possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org