zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957051337
########## datafusion/physical-plan/src/sorts/sort.rs: ########## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; - let size = get_record_batch_memory_size(&input); - + let size = get_reserved_byte_for_record_batch(&input); if self.reservation.try_grow(size).is_err() { - let before = self.reservation.size(); - self.in_mem_sort().await?; - - // Sorting may have freed memory, especially if fetch is `Some` - // - // As such we check again, and if the memory usage has dropped by - // a factor of 2, and we can allocate the necessary capacity, - // we don't spill - // - // The factor of 2 aims to avoid a degenerate case where the - // memory required for `fetch` is just under the memory available, - // causing repeated re-sorting of data - if self.reservation.size() > before / 2 - || self.reservation.try_grow(size).is_err() - { - self.spill().await?; - self.reservation.try_grow(size)? - } + self.sort_or_spill_in_mem_batches().await?; + // We've already freed more than half of reserved memory, + // so we can grow the reservation again. There's nothing we can do + // if this try_grow fails. Review Comment: > Reserves more memory for ingested batches to leave some room for merging. This PR reserves 2X memory for each batch, this works for most of the queries in sort-tpch benchmark (all except Q5 and Q8). User still have to configure sort_spill_reservation_bytes when memory reserved is not big enough for merging. I don't think it is a good change but this is the only solution I can think of to compensate for the extra memory usage for the row representation of sorted columns. Is it possible to make the merging phase also spillable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org