LantaoJin opened a new issue, #83:
URL: https://github.com/apache/datafusion-java/issues/83

   ### Is your feature request related to a problem or challenge?
   
   PR **#28** added `tempDirectory(String)` to `SessionContextBuilder` so 
callers can route DataFusion's spill files to a chosen directory. That setter 
is the only Java surface for DataFusion's `RuntimeEnvBuilder` disk-manager 
knobs today. Three real gaps remain — all three reachable on the Rust side, 
none reachable from Java:
   
   - **No way to spread spill across multiple volumes.** 
`tempDirectory(String)` accepts one path. Upstream's 
`DiskManagerMode::Directories(Vec<PathBuf>)` accepts many; spreading I/O across 
disks is a real production pattern when one disk has insufficient bandwidth or 
size.
   - **No way to disable spill entirely.** Upstream offers 
`DiskManagerMode::Disabled` for memory-only execution (queries that need spill 
fail with `ResourcesExhausted` rather than going to disk). Useful for pinning 
latency-sensitive queries to memory or for environments without writable disk.
   - **No way to cap the spill volume size.** Upstream's 
`RuntimeEnvBuilder::with_max_temp_directory_size(u64)` exists; without exposing 
it, a runaway sort or hash-aggregate can fill the spill disk — which on a 
multi-tenant node is a co-tenant outage, not just a query failure.
   
   None of the three is reachable via `setOption(...)`. `setOption` routes 
through DataFusion's `ConfigOptions::set(key, value)`. The disk-manager 
configuration lives on `RuntimeEnv` construction, not in that namespace.
   
   ### Describe the solution you'd like
   
   Three new setters on `SessionContextBuilder`, sitting next to the existing 
`tempDirectory(String)`:
   
   ```java
   // Spread spill across multiple volumes:
   SessionContext.builder()
       .tempDirectories(List.of("/data1/df-spill", "/data2/df-spill"))
       .maxTempDirectorySize(20L << 30)   // 20 GiB cap (cumulative across all 
dirs)
       .build();
   
   // Force memory-only execution; queries that would need spill fail fast:
   SessionContext.builder()
       .disableSpill()
       .build();
   
   // Single-dir + cap (the common case; the existing tempDirectory still 
works):
   SessionContext.builder()
       .tempDirectory("/tmp/df-spill")
       .maxTempDirectorySize(10L << 30)
       .build();
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to