jychen7 commented on issue #5969: URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1506166745
Before investigate deeper on reducing memory usage, we may improve `datafusion-cli` (or new option in datafusion core) to set memory pool with machine memory limit. (ps: It already uses number of CPU cores for parallel reading/sorting, but not memory) https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion/execution/src/runtime_env.rs#L152-L161 --- But unfortunately, my local run returns following error ### before unlimited, Query took 115.120 second https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion-cli/src/main.rs#L149-L152 ### after limit 16*10^9 bytes (~15GB), 80% fraction `Resources exhausted: Failed to allocate additional 59290364 bytes for GroupedHashAggregateStream[1] with 149089206 bytes already allocated - maximum available is 16563438` ``` fn create_runtime_env() -> Result<RuntimeEnv> { let rn_config = RuntimeConfig::new().with_memory_limit( 16000000000, 0.8 ); RuntimeEnv::new(rn_config) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
