JakeDern opened a new issue, #8134: URL: https://github.com/apache/arrow-rs/issues/8134
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This is an issue to capture the conversations [here](https://github.com/apache/arrow-rs/pull/8001) about ways to improve the efficiency of delta dictionary emission in the future. There is discussion in that thread about the conditions under which we can emit delta dictionaries and some efficiency concerns with doing that which I think stems mostly from separation of the record batch builder from the ipc writer itself. **Describe the solution you'd like** One idea I had is roughly describe [here](https://github.com/apache/arrow-rs/pull/8001#issuecomment-3184791114). I'll copy paste the most relevant part here which is that we could: 1. Have an IPC stream writer which owns the record batch building rather than just having finished record batches pushed into it. 2. Add APIs to the dictionary builders to track and emit the incremental deltas for whatever they're building intsead of the full set. That way the IPC writer will know for sure that every time it seals a batch that it has the exact delta for every dictionary since it owns the record batch production and is controlling what the dictionary builders do. **Describe alternatives you've considered** We could also have the IPC writer re-intern the dictionaries and get good deltas that way, but that would require a lot of extra hashing and be expensive. **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
