alamb commented on issue #3389: URL: https://github.com/apache/arrow-rs/issues/3389#issuecomment-1368902145
> You might want to think about only keeping the last dictionary for each column, to constrain memory growth and keep things simple (you wouldn't need a map). Good idea. In fact this may be what the IPC format requires (the dictionary_id is *per field* rather than per *batch* as I was thinking). I believe the spec calls a new dictionary for the same field a "delta dictionary batch" However, it appears that the arrow ipc reader doesn't actually support this yet https://github.com/apache/arrow-rs/blob/b371f41f338737b4e214d74b48e18939f5643a84/arrow-ipc/src/reader.rs#L670-L673 I will probably start my work with some arrow-ipc test cleanup (which will convince me we have coverage in the integration tests) and then move on to actually adding support for delta dictionary batches in the ipc reader/writer and then I can add support to flight -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
