tustvold commented on issue #7994:
URL: 
https://github.com/apache/arrow-datafusion/issues/7994#issuecomment-1806813395

   Thank you for writing this up, before getting into specific questions about 
the design, I think it would help to articulate what the objectives here are? 
Some possibilities might be:
   
   1. To replicate the functionality currently provided by ListingTable
   2. To facilitate the addition of new streaming sources beyond FIFO files
    
   From my reading of the document, the proposed abstractions appear to largely 
mirror the equivalent TableProvider abstractions used in #8021, whilst also 
quite closely fitting to the current FIFO file mechanism based around 
serialized byte streams. This makes me suspect that neither of the above is 
quite the vision here?
   
   On a more concrete level, it occurs to me that if StreamStoreRegistry were 
moved under StreamingTableFactory, all of this functionality would be 
encapsulated under one "extension point". Perhaps this might provide a 
mechanism to iterate and evolve these APIs incrementally, e.g. in 
datafusion-contrib or similar, without needing to front-load the design effort? 
This would allow new streaming sources to be added, and the abstractions 
evolved as necessary, all in one place? I for one anticipate Kafka, and 
certainly Kinesis, will require some non-trivial iteration on these APIs to 
accommodate their particular quirks. Just a suggestion, but it might not only 
allow us to make progress on this ticket quicker, but also yield a better 
development experience for ongoing work on the streaming functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to