albertlockett commented on code in PR #8001: URL: https://github.com/apache/arrow-rs/pull/8001#discussion_r2273576278
########## arrow-ipc/src/reader.rs: ########## @@ -1597,6 +1660,122 @@ impl<R: Read> RecordBatchReader for StreamReader<R> { } } +/// Representation of a fully parsed IpcMessage from the underlying stream. +/// Parsing this kind of message is done by higher level constructs such as +/// [`StreamReader`], because fully interpreting the messages into a record +/// batch or dictionary batch requires access to stream state such as schema +/// and the full dictionary cache. +#[derive(Debug)] +#[allow(dead_code)] Review Comment: I think I'm probably fine with this for now. FYI, it looks like the `arrow_schema::Schema` contained in the `Schema` variant of this enum is never read. Should it be read in the test? If not we could also do something like this: ```rs pub(crate) enum IpcMessage { Schema, RecordBatch(RecordBatch), #[allow(dead_code)] DictionaryBatch { id: i64, is_delta: bool, values: ArrayRef, }, } ``` (and of course, modify the `next_ipc_message` and `run_sequence_test` accordingly) Another option could be to have the `maybe_next` method of `StreamReader` actually drive the `MessageReader`, and have a separate driver in the `run_sequence_test` function which also drives it and produces these `IpcMessage`. If we do that, we could maybe also remove the `RecordBatch` contained in the `RecordBatch` variant, since it's never read in the test. The downside of that of course is we duplicate some of the dict handling logic here: https://github.com/apache/arrow-rs/blob/34b4c74a187826628be27d0dfba7503104c4ba39/arrow-ipc/src/reader.rs#L1591-L1612 ```rs let version = message.version(); let dict_values = get_dictionary_values( &body.into(), dict, &self.schema, &mut self.dictionaries_by_id, &version, false, self.skip_validation.clone(), )?; update_dictionaries( &mut self.dictionaries_by_id, dict.isDelta(), dict.id(), dict_values.clone(), )?; ``` ########## arrow-ipc/src/reader.rs: ########## @@ -1597,6 +1660,122 @@ impl<R: Read> RecordBatchReader for StreamReader<R> { } } +/// Representation of a fully parsed IpcMessage from the underlying stream. +/// Parsing this kind of message is done by higher level constructs such as +/// [`StreamReader`], because fully interpreting the messages into a record +/// batch or dictionary batch requires access to stream state such as schema +/// and the full dictionary cache. +#[derive(Debug)] +#[allow(dead_code)] Review Comment: I think I'm probably fine with this for now. FYI, it looks like the `arrow_schema::Schema` contained in the `Schema` variant of this enum is never read. Should it be read in the test? If not we could also do something like this: ```rs pub(crate) enum IpcMessage { Schema, RecordBatch(RecordBatch), #[allow(dead_code)] DictionaryBatch { id: i64, is_delta: bool, values: ArrayRef, }, } ``` (and of course, modify the `next_ipc_message` and `run_sequence_test` accordingly) Another option could be to have the `maybe_next` method of `StreamReader` actually drive the `MessageReader`, and have a separate driver in the `run_sequence_test` function which also drives it and produces these `IpcMessage`. If we do that, we could maybe also remove the `RecordBatch` contained in the `RecordBatch` variant, since it's never read in the test. The downside of that of course is we duplicate some of the dict handling logic here: https://github.com/apache/arrow-rs/blob/34b4c74a187826628be27d0dfba7503104c4ba39/arrow-ipc/src/reader.rs#L1591-L1612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org