rustyconover opened a new pull request, #49262: URL: https://github.com/apache/arrow/pull/49262
This adds low-level APIs for working with IPC dictionary messages outside of the stream/file reader/writer context, enabling message-at-a-time IPC workflows with dictionary-encoded data. I'm trying to work with record batches that contain dictionaries with record batches being serialized to shared memory so I need additional methods to handle dictionary IPC messages. I am addressing my issues from #49258. C++ changes: - Add public `ReadDictionary(Message, DictionaryMemo*, IpcReadOptions)` to read a single dictionary message into a memo - Add `CollectAndSerializeDictionaries(RecordBatch, DictionaryMemo*, IpcWriteOptions)` to serialize dictionary messages with pointer-based deduplication - Expose `dictionary_memo()` accessor on `RecordBatchStreamReader` and `RecordBatchFileReader` - Refactor internal `ReadDictionary` to `ReadDictionaryMessage` in `StreamDecoderInternal`; make `dictionary_memo_` protected Python changes: - Add `ipc.read_dictionary_message()` to populate a `DictionaryMemo` from a dictionary `Message` or `Buffer` - Add `RecordBatch.serialize_dictionaries()` to serialize dictionary IPC messages with memo-based deduplication - Add dictionary_memo property on `RecordBatchStreamReader` and `RecordBatchFileReader` - Add `DictionaryMemo.wrap()` for non-owning references to reader memos - Add `read_dictionary_message` to API docs - Comprehensive test coverage for all new APIs **AI Disclosure: I used Claude help me prepare this diff and PR. I will be responsible for all bugs, problems or inconsistencies.** ### Are there any user-facing changes? Yes they are documented above in the Python changes. **This PR includes breaking changes to public APIs.** I'm making `dictionary_memo_` on `StreamDecoderInternal`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
