milenkovicm commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3228944531
Ballista will create cache partitions locally on some of the executors. Handling the physical part of execution is specific to the implementor, such as Ballista in this case. Overall flow would be something like: 1. Ballista receives `LogicalPlan::Cache` 2. `LogicalPlan::Cache` is then converted to `BallistaCacheReadExec` , this part is handled with `BallistaQueryPlanner` (there is a bit more to it, a job can be started which would create cache with `BallistaCacheWriteExec` if not materialised already, task will be pending until cache job creation is in progress, cache partitions can be re-created in case of node failures ...) As the physical execution is tied to an external system (cache implementer) I believe we do not need to bring `CachePhysicalExec` to DataFusion, we just need to provide a `LogicalPlan::Cache`. With the proposed solution we would be able to keep the current behaviour, or we can delegate cache handling to the external system if we wish. So if the user disables local cache `datafusion.execution.local_cache=false` it would be up to them to provide a query planner which would know how to handle `LogicalPlan::Cache` Probably it would make more sense to name `datafusion.execution.local_cache=false`, `datafusion.execution.external_cache=false` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org