pantShrey commented on PR #21882: URL: https://github.com/apache/datafusion/pull/21882#issuecomment-4753642262
@alamb Thank you so much for going through this! The `RecordBatch`-level abstraction idea is really interesting, and I'll go through the implications more comprehensively , but two things come to mind right away that are worth aligning on: 1. **Dependency direction:** `SpillFile` lives in `datafusion-execution`, which doesn't currently depend on `arrow-ipc`. Moving IPC encoding/decoding into the OS backend would add that dependency. 2. **Schema:** If `read_stream()` returns a `SendableRecordBatchStream`, the backend needs a schema to initialize the decoder either stored at creation time or passed as a `SchemaRef` parameter. I do think this is the right long-term direction, and it would let us clean up `SpillWriteAdapter` entirely. That said, since this PR is currently blocking ParadeDB's Postgres integration, I'm a little worried that expanding the scope here might stall them further. Happy to defer to your judgement on whether to tackle it in this PR or as a follow-up! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
