JakeDern commented on code in PR #8001:
URL: https://github.com/apache/arrow-rs/pull/8001#discussion_r2264209301


##########
arrow-ipc/src/writer.rs:
##########
@@ -760,34 +812,108 @@ impl DictionaryTracker {
     /// * If the tracker has not been configured to error on replacement or 
this dictionary
     ///   has never been seen before, return `Ok(true)` to indicate that the 
dictionary was just
     ///   inserted.
-    pub fn insert(&mut self, dict_id: i64, column: &ArrayRef) -> Result<bool, 
ArrowError> {
-        let dict_data = column.to_data();
-        let dict_values = &dict_data.child_data()[0];
-
-        // If a dictionary with this id was already emitted, check if it was 
the same.
-        if let Some(last) = self.written.get(&dict_id) {
-            if ArrayData::ptr_eq(&last.child_data()[0], dict_values) {
-                // Same dictionary values => no need to emit it again
-                return Ok(false);
+    pub fn insert(

Review Comment:
   On second thought, I don't think my idea is good because the handling 
definitely belongs on the `IpcWriteOptions` and often times those options are 
passed around along with the dictionary tracker.
   
   I think it would be confusing if it were possible to set one 
`DictionaryHandling` type on the tracker and another on the write options and 
not really know which is used without reading the code.
   
   Maybe in a future release we can take a look at the overall API and how the 
responsibilities of the dictionary tracker and the various writers intersect. 
   
   I've taken your suggestion to pass in the DictionaryHandling directly 
instead of a boolean and also add a new method while deprecating the old to get 
this into a sooner release if possible. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to