jayzhan211 opened a new issue, #11420:
URL: https://github.com/apache/datafusion/issues/11420

   ### Is your feature request related to a problem or challenge?
   
   There are many functions in `datafusion-core` that take `SessionState` as 
arguments but only actually rely on portion of them. This add the additional 
dependency that is not necessary, therefore blocking us from extracting module 
out of core #10782.
   
   For example, If we want to pull `CatalogProvider` out of core, we need to 
pull out `TableProvider` first. But because it has `scan` function that takes 
`SessionState` which contains `CatalogProviderList` therefore there is a 
circular dependency.  Similar issues are already mentioned in 
https://github.com/apache/datafusion/issues/11182
    
   
   ### Describe the solution you'd like
   
   I think we need to redesign those functions that take `SessionState` and 
minimize the dependencies for them.
   
   Given one of the `scan` function here, we can see that we only need 
`state.config_options().explain` and `state.execution_props()` instead of the 
whole `SessionState`
   ```rust
   async fn scan(
           &self,
           state: &SessionState,
           projection: Option<&Vec<usize>>,
           _filters: &[Expr],
           _limit: Option<usize>,
       ) -> Result<Arc<dyn ExecutionPlan>> {
           let mut partitions = vec![];
           for arc_inner_vec in self.batches.iter() {
               let inner_vec = arc_inner_vec.read().await;
               partitions.push(inner_vec.clone())
           }
   
           let mut exec =
               MemoryExec::try_new(&partitions, self.schema(), 
projection.cloned())?;
   
           let show_sizes = state.config_options().explain.show_sizes;
           exec = exec.with_show_sizes(show_sizes);
   
           // add sort information if present
           let sort_order = self.sort_order.lock();
           if !sort_order.is_empty() {
               let df_schema = 
DFSchema::try_from(self.schema.as_ref().clone())?;
   
               let file_sort_order = sort_order
                   .iter()
                   .map(|sort_exprs| {
                       create_physical_sort_exprs(
                           sort_exprs,
                           &df_schema,
                           state.execution_props(),
                       )
                   })
                   .collect::<Result<Vec<_>>>()?;
               exec = exec.with_sort_information(file_sort_order);
           }
   
           Ok(Arc::new(exec))
       }
   ```
   
   In this case, we can introduce `TableProviderConext` that only contains 
portion of information from `SessionState`.
   
   ```rust
   pub struct TableProviderConext {
       explain_options: ExplainOptions,
       executionProps: ExecutionProps // maybe with reference to avoid copy
   }
   ```
   
   The same idea applies to `PhysicalPlanner`, we usually just need 
`PhysicalOptimizeRules` or other information about plan.
   ```rust
   pub trait PhysicalPlanner: Send + Sync {
       /// Create a physical plan from a logical plan
       async fn create_physical_plan(
           &self,
           logical_plan: &LogicalPlan,
           session_state: &SessionState,
       ) -> Result<Arc<dyn ExecutionPlan>>;
   
       /// Create a physical expression from a logical expression
       /// suitable for evaluation
       ///
       /// `expr`: the expression to convert
       ///
       /// `input_dfschema`: the logical plan schema for evaluating `expr`
       fn create_physical_expr(
           &self,
           expr: &Expr,
           input_dfschema: &DFSchema,
           session_state: &SessionState,
       ) -> Result<Arc<dyn PhysicalExpr>>;
   }
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to