alamb commented on issue #1754: URL: https://github.com/apache/arrow-datafusion/issues/1754#issuecomment-1485840743
Update 2023-03-27: I have moved `TaskContext` into `datafusion-execution` 🎉 and updated the list in https://github.com/apache/arrow-datafusion/issues/1754#issuecomment-1452438453 To move datasource into its own crate I need to do the following two non trivial API changes (added to the list above) - [ ] remove SessionState reference from file_format (replace with TaskContext) - [ ] remove SessionState reference from `TableProvider` (replace with `TaskContext`) Here are the file_format uses (I don't think this will be major) ``` async fn infer_stats( &self, _state: &SessionState, _store: &Arc<dyn ObjectStore>, _table_schema: SchemaRef, _object: &ObjectMeta, ) -> Result<Statistics> { Ok(Statistics::default()) } ``` The `TableProvider` will be major I think: ```rust /// Create an ExecutionPlan that will scan the table. /// The table provider will be usually responsible of grouping /// the source data into partitions that can be efficiently /// parallelized or distributed. async fn scan( &self, state: &SessionState, projection: Option<&Vec<usize>>, filters: &[Expr], // limit can be used to reduce the amount scanned // from the datasource as a performance optimization. // If set, it contains the amount of rows needed by the `LogicalPlan`, // The datasource should return *at least* this number of rows if available. limit: Option<usize>, ) -> Result<Arc<dyn ExecutionPlan>>; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org