isidentical commented on issue #462: URL: https://github.com/apache/arrow-datafusion/issues/462#issuecomment-1270620697
I was thinking about how we can split, and an initial plan might look like this if there are no objections on separating `ContinuanceStream` as a single patch (if it sounds better, also can combine first two steps). ## Possible roadmap? - [ ] Add continuance streams (a "working table" operation for DataFusion that actually uses streams under the hood). The implementation is self-contained enough that I think it could be split (with tests), and it would include the `push_relation_handler`/`pop_relation_handler` piece in task contexts, as well as the implementation of the physical operation. The only question would be whether it is fine to add a new physical operation that doesn't have immediate usage? - [ ] Implement recursive queries (as a both physical and a logical operation). This would be a sizable change that can actually implement the initial piece of logic (without distinct) where we could execute queries up until a certain condition has been met. It would also include new logical operations (`RecursiveQuery` and `NamedRelation`) and also the actual usage of the continuance streams. - [ ] Enable SQL planning The implementation in terms of SQL is completely decoupled from the actual logical/physical representation, and I think it can be added last, the algorithm is basically using a temporary CTE and then replacing it with the original form, more details in the main PR. - [ ] Start supporting `UNION` This would require us to actually record what sort of values we have actually collected (probably not direct references, but hashes) and it would be a bit less efficient than the `UNION ALL` solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
