Light-City commented on code in PR #37896:
URL: https://github.com/apache/arrow/pull/37896#discussion_r1358097865
##########
cpp/src/arrow/record_batch.cc:
##########
@@ -432,4 +433,43 @@ RecordBatchReader::~RecordBatchReader() {
ARROW_WARN_NOT_OK(this->Close(), "Implicitly called RecordBatchReader::Close
failed");
}
+Result<std::shared_ptr<RecordBatch>> ConcatenateRecordBatches(
+ const RecordBatchVector& batches, MemoryPool* pool) {
+ int64_t length = 0;
+ size_t n = batches.size();
+ if (n == 0) {
+ return Status::Invalid("Must pass at least one recordbatch");
+ }
+ if (n == 1) {
+ return batches[0];
+ }
+ int cols = batches[0]->num_columns();
+ auto schema = batches[0]->schema();
+ std::vector<std::shared_ptr<Array>> columns;
+ if (cols == 0) {
+ // special case: null batch, no data, just length
+ for (size_t i = 0; i < batches.size(); ++i) {
+ length += batches[i]->num_rows();
+ }
+ } else {
Review Comment:
The code looks very refreshing, but it does not handle the case of cols = 0.
In this case, cols = 0, length is not obtained.
For a batch, there may be cols = 0, the schema is empty, but there is
length. This kind of batch is used to process count(*) and does not store the
actual columns, only the number of rows.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]