alamb commented on issue #3389:
URL: https://github.com/apache/arrow-rs/issues/3389#issuecomment-1368902145

   > You might want to think about only keeping the last dictionary for each 
column, to constrain memory growth and keep things simple (you wouldn't need a 
map).
   
   Good idea. In fact this may be what the IPC format requires (the 
dictionary_id is *per field* rather than per *batch* as I was thinking). I 
believe the spec calls a new dictionary for the same field a "delta dictionary 
batch"
   
   However, it appears that the arrow ipc reader doesn't actually support this 
yet
   
   
https://github.com/apache/arrow-rs/blob/b371f41f338737b4e214d74b48e18939f5643a84/arrow-ipc/src/reader.rs#L670-L673
   
   I will probably start my work with some arrow-ipc test cleanup (which will 
convince me we have coverage in the integration tests) and then move on to 
actually adding support for delta dictionary batches in the ipc reader/writer 
and then I can add support to flight


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to