tustvold commented on issue #7994:
URL:
https://github.com/apache/arrow-datafusion/issues/7994#issuecomment-1806813395
Thank you for writing this up, before getting into specific questions about
the design, I think it would help to articulate what the objectives here are?
Some possibilities might be:
1. To replicate the functionality currently provided by ListingTable
2. To facilitate the addition of new streaming sources beyond FIFO files
From my reading of the document, the proposed abstractions appear to largely
mirror the equivalent TableProvider abstractions used in #8021, whilst also
quite closely fitting to the current FIFO file mechanism based around
serialized byte streams. This makes me suspect that neither of the above is
quite the vision here?
On a more concrete level, it occurs to me that if StreamStoreRegistry were
moved under StreamingTableFactory, all of this functionality would be
encapsulated under one "extension point". Perhaps this might provide a
mechanism to iterate and evolve these APIs incrementally, e.g. in
datafusion-contrib or similar, without needing to front-load the design effort?
This would allow new streaming sources to be added, and the abstractions
evolved as necessary, all in one place? I for one anticipate Kafka, and
certainly Kinesis, will require some non-trivial iteration on these APIs to
accommodate their particular quirks. Just a suggestion, but it might not only
allow us to make progress on this ticket quicker, but also yield a better
development experience for ongoing work on the streaming functionality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]