Re: [PR] feat: introduce pluggable SpillFile trait and TempFileFactory for custom spill backends [datafusion]

via GitHub Fri, 19 Jun 2026 10:38:00 -0700


pantShrey commented on PR #21882:
URL: https://github.com/apache/datafusion/pull/21882#issuecomment-4753642262


   @alamb Thank you so much for going through this! 
   
   The `RecordBatch`-level abstraction idea is really interesting, and I'll go 
through the implications more comprehensively , but two things come to mind 
right away that are worth aligning on:
   
   1. **Dependency direction:** `SpillFile` lives in `datafusion-execution`, 
which doesn't currently depend on `arrow-ipc`. Moving IPC encoding/decoding 
into the OS backend would add that dependency.
   2. **Schema:** If `read_stream()` returns a `SendableRecordBatchStream`, the 
backend needs a schema to initialize the decoder either stored at creation time 
or passed as a `SchemaRef` parameter.
   
   I do think this is the right long-term direction, and it would let us clean 
up `SpillWriteAdapter` entirely. That said, since this PR is currently blocking 
ParadeDB's Postgres integration, I'm a little worried that expanding the scope 
here might stall them further. 
   
   Happy to defer to your judgement on whether to tackle it in this PR or as a 
follow-up!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: introduce pluggable SpillFile trait and TempFileFactory for custom spill backends [datafusion]

Reply via email to