zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2803881895
It seems when we merge the sorted batch, we already using the interleave to merge the sorted indices, here is the code: ```rust /// Drains the in_progress row indexes, and builds a new RecordBatch from them /// /// Will then drop any batches for which all rows have been yielded to the output /// /// Returns `None` if no pending rows pub fn build_record_batch(&mut self) -> Result<Option<RecordBatch>> { if self.is_empty() { return Ok(None); } let columns = (0..self.schema.fields.len()) .map(|column_idx| { let arrays: Vec<_> = self .batches .iter() .map(|(_, batch)| batch.column(column_idx).as_ref()) .collect(); Ok(interleave(&arrays, &self.indices)?) }) .collect::<Result<Vec<_>>>()?; self.indices.clear(); ``` But this PR, we also concat some batches into one batch, do you mean we can also use the indices from each batch to one batch just like the merge phase? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org