DDtKey commented on issue #5108: URL: https://github.com/apache/arrow-datafusion/issues/5108#issuecomment-1412879293
This behavior(at least my case described above) were introduced here (a9ddcd3a7558437361835120659b946b903468e1, [PR link](https://github.com/apache/arrow-datafusion/pull/4867)). Before - it returned `Resources exhausted` when I used memory-pool and currently the memory usage grows up to OOM. It could be reproduced with similar code: ```rust let ctx = SessionContext::with_config_rt( SessionConfig::default(), Arc::new( RuntimeEnv::new( RuntimeConfig::new() .with_memory_pool(Arc::new(FairSpillPool::new(4 * 1024 * 1024 * 1024))), ) .unwrap(), ), ); // file size ~ 1.3 GB. // I can share the file - it's kind of random data, but not sure what I can use to do that. However, it's reproducible for any large file with this code. ctx.register_csv("hr", file_path, CsvReadOptions::default()) .await?; // 4 joins - just to represent a problem let data_frame = ctx .sql( r#" SELECT hr1."Emp_ID" from hr hr1 left join hr hr2 on hr1."Emp_ID" = hr2."Emp_ID" left join hr hr3 on hr2."Emp_ID" = hr3."Emp_ID" left join hr hr4 on hr3."Emp_ID" = hr4."Emp_ID" "#, ) .await?; data_frame .write_csv(output_path) .await?; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
