lidavidm commented on a change in pull request #9656:
URL: https://github.com/apache/arrow/pull/9656#discussion_r590508192
##########
File path: cpp/src/arrow/ipc/reader.cc
##########
@@ -1022,6 +1209,31 @@ class RecordBatchFileReaderImpl : public
RecordBatchFileReader {
ReadStats stats() const override { return stats_; }
+ Result<AsyncGenerator<std::shared_ptr<RecordBatch>>> GetRecordBatchGenerator(
+ int readahead_messages, const io::IOContext& io_context) override {
+ auto state = std::make_shared<IpcFileRecordBatchGeneratorState>();
+ state->num_dictionaries_ = num_dictionaries();
+ state->num_record_batches_ = num_record_batches();
+ state->file_ = file_;
+ state->options_ = options_;
+ state->owned_file_ = owned_file_;
+ state->footer_buffer_ = footer_buffer_;
+ state->footer_ = footer_;
+ // Must regenerate uncopyable DictionaryMemo
+ RETURN_NOT_OK(UnpackSchemaMessage(state->footer_->schema(),
state->options_,
+ &state->dictionary_memo_,
&state->schema_,
+ &state->out_schema_,
&state->field_inclusion_mask_,
+ &state->swap_endian_));
+ AsyncGenerator<std::shared_ptr<Message>> message_generator =
+ IpcMessageGenerator(state, io_context);
+ if (readahead_messages > 0) {
+ message_generator =
+ MakeReadaheadGenerator(std::move(message_generator),
readahead_messages);
+ }
+ return IpcFileRecordBatchGenerator(state, message_generator,
+ arrow::internal::GetCpuThreadPool());
Review comment:
Usually not, but it would be in the case of compressed buffers. I could
also change it to not offload onto a secondary pool by default (and hence do
the work on the same thread used to read data) and/or benchmark if there's any
overhead to this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]