richox opened a new issue, #2829:
URL: https://github.com/apache/arrow-datafusion/issues/2829

   **Describe the bug**
   
   currently, each requester can allocate at most `(pool_size - tracker_total) 
/ num_requesters` memory. if an execution plan is skewed there will be a lot of 
memory wasted and unexpected spills.
   
   for example, when we perform sort -> merge joining on two inputs, and one of 
them is very small:
   ```
   SortMergeJoin {
       left: Sort { // bigger input
       },
       right: Sort { // smaller input
       },
       ...
   }
   ```
   
   the bigger sorter can use only half of the total memory, which will severely 
triggers more spills and slows down the performance.
   
   ```
    
   **To Reproduce**
   
   in the following case, we have 3 requesters requesting 90 of 100 memory. as 
the memory is not used up, there would be no spills during the execution. but 
actually the big requester spills every times.
   ```
       #[tokio::test]
       async fn multiple_skewed_requesters() {
           let config = RuntimeConfig::new()
               .with_memory_manager(MemoryManagerConfig::try_new_limit(100, 
1.0).unwrap());
           let runtime = Arc::new(RuntimeEnv::new(config).unwrap());
   
           let big_requester1 = DummyRequester::new(0, runtime.clone());
           runtime.register_requester(&big_requester1.id);
           let small_requester2 = DummyRequester::new(0, runtime.clone());
           runtime.register_requester(&small_requester2.id);
           let small_requester3 = DummyRequester::new(0, runtime.clone());
           runtime.register_requester(&small_requester3.id);
   
           small_requester2.do_with_mem(5).await.unwrap();
           small_requester3.do_with_mem(5).await.unwrap();
   
           big_requester1.do_with_mem(40).await.unwrap();
           big_requester1.do_with_mem(40).await.unwrap();
           assert_eq!(big_requester1.get_spills(), 0); // assertion fails, 
actually = 2
       }
   ```
   **Expected behavior**
   
   spill only when all memory is used up. the spill strategy can be optimized, 
for example we can find out the maximum requester and let it spill.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to