gruuya commented on issue #7149:
URL: 
https://github.com/apache/arrow-datafusion/issues/7149#issuecomment-1659812330

   Oh I think I see the problem: the fetch count is never utilized while the 
sort is accumulating batches: 
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/sort.rs#L628-L632
   
   Consequently, all the batches get loaded into memory inside of the 
`ExternalSorter`:
   ```diff
   diff --git a/datafusion/core/src/physical_plan/sorts/sort.rs 
b/datafusion/core/src/physical_plan/sorts/sort.rs
   index f660f0acf..6eb63e39a 100644
   --- a/datafusion/core/src/physical_plan/sorts/sort.rs
   +++ b/datafusion/core/src/physical_plan/sorts/sort.rs
   @@ -159,6 +159,7 @@ impl ExternalSorter {
                }
            }
   
   +        debug!("Inserted batch of size {} for a total of {} in-memory 
batches", input.num_rows(), self.in_mem_batches.len() + 1);
            self.in_mem_batches.push(input);
            self.in_mem_batches_sorted = false;
            Ok(())
   ```
   ```bash
   [2023-08-01T08:13:27Z TRACE datafusion::physical_plan::limit] Start 
GlobalLimitExec::execute for partition: 0
   [2023-08-01T08:13:27Z TRACE datafusion::physical_plan::sorts::sort] Start 
SortExec::execute for partition 0 of context session_id 
8855be36-07c1-495b-b6fd-9cb9501cb46d and task_id None
   [2023-08-01T08:13:27Z TRACE datafusion::physical_plan::sorts::sort] End 
SortExec's input.execute for partition: 0
   [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 8192 for a total of 1 in-memory batches
   [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 209 for a total of 2 in-memory batches
   [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 8192 for a total of 3 in-memory batches
   [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 209 for a total of 4 in-memory batches
   [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 8192 for a total of 5 in-memory batches
   [2023-08-01T08:13:28Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 209 for a total of 6 in-memory batches
   ...
   [2023-08-01T08:14:12Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 209 for a total of 710 in-memory batches
   [2023-08-01T08:14:12Z DEBUG datafusion::physical_plan::sorts::sort] Inserted 
batch of size 6836 for a total of 711 in-memory batches
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to