rustyconover opened a new pull request, #49262:
URL: https://github.com/apache/arrow/pull/49262

   This adds low-level APIs for working with IPC dictionary messages outside of 
the stream/file reader/writer context, enabling message-at-a-time IPC workflows 
with dictionary-encoded data.  I'm trying to work with record batches that 
contain dictionaries with record batches being serialized to shared memory so I 
need additional methods to handle dictionary IPC messages.
   
   I am addressing my issues from #49258.
   
   C++ changes:
   - Add public `ReadDictionary(Message, DictionaryMemo*, IpcReadOptions)` to 
read a single dictionary message into a memo
   - Add `CollectAndSerializeDictionaries(RecordBatch, DictionaryMemo*, 
IpcWriteOptions)` to serialize dictionary messages with pointer-based 
deduplication
   - Expose `dictionary_memo()` accessor on `RecordBatchStreamReader` and 
`RecordBatchFileReader`
   - Refactor internal `ReadDictionary` to `ReadDictionaryMessage` in 
`StreamDecoderInternal`; make `dictionary_memo_` protected
   
   Python changes:
   - Add `ipc.read_dictionary_message()` to populate a `DictionaryMemo` from a 
dictionary `Message` or `Buffer`
   - Add `RecordBatch.serialize_dictionaries()` to serialize dictionary IPC 
messages with memo-based deduplication
   - Add dictionary_memo property on `RecordBatchStreamReader` and 
`RecordBatchFileReader`
   - Add `DictionaryMemo.wrap()` for non-owning references to reader memos
   - Add `read_dictionary_message` to API docs
   - Comprehensive test coverage for all new APIs
   
   **AI Disclosure: I used Claude help me prepare this diff and PR.  I will be 
responsible for all bugs, problems or inconsistencies.**
   
   ### Are there any user-facing changes?
   
   Yes they are documented above in the Python changes.
   
   **This PR includes breaking changes to public APIs.**
   
   I'm making `dictionary_memo_` on `StreamDecoderInternal`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to