alamb commented on issue #5108:
URL: 
https://github.com/apache/arrow-datafusion/issues/5108#issuecomment-1414320116

   My measurements actually suggest that DataFusion 17.0.0 is better in this 
regards than DataFusion 16.0.0
   
   
   
   Using this input file:
   ```shell
   !curl -L 
'https://drive.google.com/uc?export=download&id=18gv0Yd_a-Zc7CSolol8qeYVAAzSthnSN&confirm=t'
 > lineitem.parquet
   ```
   
   Using this program:
   
   
   ```rust
   use datafusion::{prelude::{SessionContext, SessionConfig}, error::Result, 
execution::{runtime_env::{RuntimeConfig, RuntimeEnv}, 
memory_pool::{GreedyMemoryPool, FairSpillPool}, 
disk_manager::DiskManagerConfig}};
   
   #[tokio::main(flavor = "multi_thread", worker_threads = 10)]
   async fn main() -> Result<()> {
   
       let runtime_config = RuntimeConfig::new()
       //.with_memory_pool(Arc::new(GreedyMemoryPool::new(1024*1024*1024)))
           .with_memory_pool(Arc::new(FairSpillPool::new(1024*1024*1024)))
           
.with_disk_manager(DiskManagerConfig::new_specified(vec!["/tmp/".into()]));
   
       let runtime = Arc::new(RuntimeEnv::new(runtime_config).unwrap());
       let ctx = SessionContext::with_config_rt(SessionConfig::new(), runtime);
   
       ctx.register_parquet("lineitem", 
"/Users/alamb/Downloads/lineitem.parquet", Default::default())
           .await.unwrap();
   
       let df = ctx.sql("select * from lineitem order by 
l_shipdate").await.unwrap();
   
       df.write_parquet("/Users/alamb/Downloads/lineitem_Datafusion.parquet", 
None)
           .await
       .unwrap();
   
       Ok(())
   }
   ```
   
   I tested with both DataFusion `16.0.0` / `17.0.0`  and FairSpillPill / 
GreedyMemoryPool
   
   
   ```toml
   datafusion = { version = "16.0.0" }
   ```
   
   or
   
   ```toml
   datafusion = { version = "17.0.0" }
   ```
   
   And this:
   
   ```rust
           .with_memory_pool(Arc::new(FairSpillPool::new(1024*1024*1024)))
   ```
   
   Or
   ```rust
           .with_memory_pool(Arc::new(FairSpillPool::new(1024*1024*1024)))
   ```
   
   
   ## Datafusion 16.0.0 with FairSpillPool:
   
   ```
        Running `/Users/alamb/Software/target-df/release/rust_arrow_playground`
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
ParquetError(ArrowError("underlying Arrow error: External error: Arrow error: 
External error: Resources exhausted: Failed to allocate additional 1419488 
bytes for RepartitionExec[14] with 2837440 bytes already allocated - maximum 
available is 0"))', src/main.rs:26:6
   stack backtrace:
   
   ```
   
   ## DataFusion 16.0.0 and GreedyMemoryPool
   
   ```
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
ParquetError(ArrowError("underlying Arrow error: External error: Arrow error: 
External error: Resources exhausted: Failed to allocate additional 1419168 
bytes for RepartitionExec[4] with 0 bytes already allocated - maximum available 
is 552160"))', src/main.rs:26:6
   ```
   
   ## DataFusion `17.0.0` and `FairMemoryPool` I got:
   
   The program completed successfully 🎉 
   
   ## DataFusion `17.0.0` and GreedyMemoryPool I got:
   
   ```
   warning: `rust_arrow_playground` (bin "rust_arrow_playground") generated 1 
warning
       Finished release [optimized] target(s) in 3m 35s
        Running `/Users/alamb/Software/target-df/release/rust_arrow_playground`
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
ParquetError(ArrowError("underlying Arrow error: External error: Arrow error: 
External error: Resources exhausted: Failed to allocate additional 1419168 
bytes for RepartitionExec[4] with 0 bytes already allocated - maximum available 
is 552160"))', src/main.rs:26:6
   stack backtrace:
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to