[GitHub] [arrow-datafusion] alamb commented on issue #1754: split datafusion-physical-plan sub-module

via GitHub Mon, 27 Mar 2023 13:47:59 -0700


alamb commented on issue #1754:
URL: 
https://github.com/apache/arrow-datafusion/issues/1754#issuecomment-1485840743


   
   
   Update 2023-03-27: 
   
   I have moved `TaskContext` into `datafusion-execution` 🎉  and updated the 
list in 
https://github.com/apache/arrow-datafusion/issues/1754#issuecomment-1452438453
   
   To move datasource into its own crate I need to do the following two non 
trivial API changes (added to the list above)
   - [ ] remove SessionState reference from file_format (replace with 
TaskContext)
   - [ ] remove SessionState reference from `TableProvider` (replace with 
`TaskContext`)
   
   Here are the file_format uses (I don't think this will be major)
   
   ```
       async fn infer_stats(
           &self,
           _state: &SessionState,
           _store: &Arc<dyn ObjectStore>,
           _table_schema: SchemaRef,
           _object: &ObjectMeta,
       ) -> Result<Statistics> {
           Ok(Statistics::default())
       }
   ```
   
   
   The `TableProvider` will be major I think:
   
   ```rust
       /// Create an ExecutionPlan that will scan the table.
       /// The table provider will be usually responsible of grouping
       /// the source data into partitions that can be efficiently
       /// parallelized or distributed.
       async fn scan(
           &self,
           state: &SessionState,
           projection: Option<&Vec<usize>>,
           filters: &[Expr],
           // limit can be used to reduce the amount scanned
           // from the datasource as a performance optimization.
           // If set, it contains the amount of rows needed by the 
`LogicalPlan`,
           // The datasource should return *at least* this number of rows if 
available.
           limit: Option<usize>,
       ) -> Result<Arc<dyn ExecutionPlan>>;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #1754: split datafusion-physical-plan sub-module

Reply via email to