jychen7 commented on issue #5969:
URL: 
https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1506166745

   Before investigate deeper on reducing memory usage, we may improve 
`datafusion-cli` (or new option in datafusion core) to set memory pool with 
machine memory limit.
   (ps: It already uses number of CPU cores for parallel reading/sorting, but 
not memory)
   
   
https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion/execution/src/runtime_env.rs#L152-L161
   
   ---
   
   But unfortunately, my local run returns following error
   
   ### before
   unlimited, Query took 115.120 second
   
https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion-cli/src/main.rs#L149-L152
   
   ### after
   limit 16*10^9 bytes (~15GB), 80% fraction
   `Resources exhausted: Failed to allocate additional 59290364 bytes for 
GroupedHashAggregateStream[1] with 149089206 bytes already allocated - maximum 
available is 16563438`
   ```
   fn create_runtime_env() -> Result<RuntimeEnv> {
       let rn_config = RuntimeConfig::new().with_memory_limit(
           16000000000,
           0.8
       );
       RuntimeEnv::new(rn_config)
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to