gruuya commented on issue #7149: URL: https://github.com/apache/arrow-datafusion/issues/7149#issuecomment-1659812330
Oh I think I see the problem: the fetch count is never utilized while the sort is accumulating batches: https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/sort.rs#L628-L632 Consequently, all the batches get loaded into memory inside of the `ExternalSorter`: ```diff diff --git a/datafusion/core/src/physical_plan/sorts/sort.rs b/datafusion/core/src/physical_plan/sorts/sort.rs index f660f0acf..6eb63e39a 100644 --- a/datafusion/core/src/physical_plan/sorts/sort.rs +++ b/datafusion/core/src/physical_plan/sorts/sort.rs @@ -159,6 +159,7 @@ impl ExternalSorter { } } + debug!("Inserted batch of size {} for a total of {} in-memory batches", input.num_rows(), self.in_mem_batches.len() + 1); self.in_mem_batches.push(input); self.in_mem_batches_sorted = false; Ok(()) ``` ```bash [2023-08-01T08:13:27Z TRACE datafusion::physical_plan::limit] Start GlobalLimitExec::execute for partition: 0 [2023-08-01T08:13:27Z TRACE datafusion::physical_plan::sorts::sort] Start SortExec::execute for partition 0 of context session_id 8855be36-07c1-495b-b6fd-9cb9501cb46d and task_id None [2023-08-01T08:13:27Z TRACE datafusion::physical_plan::sorts::sort] End SortExec's input.execute for partition: 0 [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 8192 for a total of 1 in-memory batches [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 209 for a total of 2 in-memory batches [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 8192 for a total of 3 in-memory batches [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 209 for a total of 4 in-memory batches [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 8192 for a total of 5 in-memory batches [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 209 for a total of 6 in-memory batches ... [2023-08-01T08:14:12Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 209 for a total of 710 in-memory batches [2023-08-01T08:14:12Z DEBUG datafusion::physical_plan::sorts::sort] Inserted batch of size 6836 for a total of 711 in-memory batches ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
