[GitHub] [arrow-datafusion] DDtKey commented on issue #5108: Out of memory when sorting

via GitHub Wed, 01 Feb 2023 15:12:05 -0800


DDtKey commented on issue #5108:
URL: 
https://github.com/apache/arrow-datafusion/issues/5108#issuecomment-1412879293


   This behavior(at least my case described above) were introduced here 
(a9ddcd3a7558437361835120659b946b903468e1, [PR 
link](https://github.com/apache/arrow-datafusion/pull/4867)).
   
   Before - it returned `Resources exhausted` when I used memory-pool and 
currently the memory usage grows up to OOM.
   
   It could be reproduced with similar code:
   
   ```rust
      let ctx = SessionContext::with_config_rt(
           SessionConfig::default(),
           Arc::new(
               RuntimeEnv::new(
                   RuntimeConfig::new()
                       .with_memory_pool(Arc::new(FairSpillPool::new(4 * 1024 * 
1024 * 1024))),
               )
               .unwrap(),
           ),
       );
       
       // file size ~ 1.3 GB. 
       // I can share the file - it's kind of random data, but not sure what I 
can use to do that. However, it's reproducible for any large file with this 
code.
       ctx.register_csv("hr", file_path, CsvReadOptions::default())
           .await?;
           
       // 4 joins - just to represent a problem
       let data_frame = ctx
           .sql(
               r#"
           SELECT hr1."Emp_ID"
           from hr hr1 
           left join hr hr2 on hr1."Emp_ID" = hr2."Emp_ID" 
           left join hr hr3 on hr2."Emp_ID" = hr3."Emp_ID" 
           left join hr hr4 on hr3."Emp_ID" = hr4."Emp_ID"
       "#,
           )
           .await?;
           
           data_frame
           .write_csv(output_path)
           .await?;
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] DDtKey commented on issue #5108: Out of memory when sorting

Reply via email to