houqp commented on issue #970:
URL: 
https://github.com/apache/arrow-datafusion/issues/970#issuecomment-947946521


   Thanks @alamb for adding the diagrams, really helps to visualize the idea :)
   
   > We might even be able to do it without any changes to the TableProvider as 
of now.
   
   Question is without the table providers telling the planner that `Scan B` 
and `Scan C belongs to the same external database instance, how would it be 
able to decide whether these two scans can be combined into a single push down 
query? Imagine a case where Scan B is for a postgres database instance managed 
by AWS RDS, while scan C is for a postgres database instance managed in GCP, 
the planner need to be able to know they are from two different database 
instances so it can decide whether the join should happen within datafusion or 
gets pushed down.
   
   On top of that, we also need a way to let the planner know what compute plan 
nodes are supported by a particular database type. I think the table provider 
could be a good abstraction to provide this info.
   
   > The specifics of what types of subplans could be converted is probably 
specific to the database being pushed to, so it isn't clear that logic belongs 
in the main datafusion crate.
   
   I feel like this should be defined inside the table provider implementation, 
which should be maintained as plugins outside of datafusion core. Datafusion 
core should just maintain the current memory and listing table providers. Or 
maybe those two can be moved out of the core one day :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to