jayzhan211 opened a new issue, #11420: URL: https://github.com/apache/datafusion/issues/11420
### Is your feature request related to a problem or challenge? There are many functions in `datafusion-core` that take `SessionState` as arguments but only actually rely on portion of them. This add the additional dependency that is not necessary, therefore blocking us from extracting module out of core #10782. For example, If we want to pull `CatalogProvider` out of core, we need to pull out `TableProvider` first. But because it has `scan` function that takes `SessionState` which contains `CatalogProviderList` therefore there is a circular dependency. Similar issues are already mentioned in https://github.com/apache/datafusion/issues/11182 ### Describe the solution you'd like I think we need to redesign those functions that take `SessionState` and minimize the dependencies for them. Given one of the `scan` function here, we can see that we only need `state.config_options().explain` and `state.execution_props()` instead of the whole `SessionState` ```rust async fn scan( &self, state: &SessionState, projection: Option<&Vec<usize>>, _filters: &[Expr], _limit: Option<usize>, ) -> Result<Arc<dyn ExecutionPlan>> { let mut partitions = vec![]; for arc_inner_vec in self.batches.iter() { let inner_vec = arc_inner_vec.read().await; partitions.push(inner_vec.clone()) } let mut exec = MemoryExec::try_new(&partitions, self.schema(), projection.cloned())?; let show_sizes = state.config_options().explain.show_sizes; exec = exec.with_show_sizes(show_sizes); // add sort information if present let sort_order = self.sort_order.lock(); if !sort_order.is_empty() { let df_schema = DFSchema::try_from(self.schema.as_ref().clone())?; let file_sort_order = sort_order .iter() .map(|sort_exprs| { create_physical_sort_exprs( sort_exprs, &df_schema, state.execution_props(), ) }) .collect::<Result<Vec<_>>>()?; exec = exec.with_sort_information(file_sort_order); } Ok(Arc::new(exec)) } ``` In this case, we can introduce `TableProviderConext` that only contains portion of information from `SessionState`. ```rust pub struct TableProviderConext { explain_options: ExplainOptions, executionProps: ExecutionProps // maybe with reference to avoid copy } ``` The same idea applies to `PhysicalPlanner`, we usually just need `PhysicalOptimizeRules` or other information about plan. ```rust pub trait PhysicalPlanner: Send + Sync { /// Create a physical plan from a logical plan async fn create_physical_plan( &self, logical_plan: &LogicalPlan, session_state: &SessionState, ) -> Result<Arc<dyn ExecutionPlan>>; /// Create a physical expression from a logical expression /// suitable for evaluation /// /// `expr`: the expression to convert /// /// `input_dfschema`: the logical plan schema for evaluating `expr` fn create_physical_expr( &self, expr: &Expr, input_dfschema: &DFSchema, session_state: &SessionState, ) -> Result<Arc<dyn PhysicalExpr>>; } ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org