gruuya opened a new issue, #7292: URL: https://github.com/apache/arrow-datafusion/issues/7292
### Is your feature request related to a problem or challenge? A lot of DBs and table formats (e.g. Delta Lake) rely on one form or another of MVCC, whererby writes to a given table will result in a new table version. Typically, writes will usually append new files to the table state (`INSERT`) and/or potentially remove some files from the state (`UPDATE`/`DELETE`). A core feature of such systems is the ability to travel between different table versions, so that one can query some earlier (non-latest) table state. However, this is not currently officially supported by DataFusion, though it is doable in a hacky way (see below for details or [here](https://www.splitgraph.com/blog/seafowl-delta-storage-layer) for how it looks in seafowl right now). ### Describe the solution you'd like First part of the work would be in the `sqlparser` crate, which would need to support the standard temporal table specifier in the form of a `AS OF` clause (https://en.wikipedia.org/wiki/SQL:2011) I think this should probably be captured in a new field in [`TableFactor::Table`](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/query.rs#L650-L664). Over at DataFusion side, besides capturing the parsed version in the `TableScan` logical plan, I imagine the main (breaking) change would be to alter the signature of the `SchemaProvider::table` method to something like ```rust async fn table(&self, name: &str, version: Option<TableVersion>) -> Option<Arc<dyn TableProvider>>; ``` with `TableVersion` being some kind of an enum covering time formats or literal version denotations for starters. That would enable the implementer of this trait to know which specific table version needs to be loaded (if any). ### Describe alternatives you've considered The alternative that seafowl uses atm is the following: - abuse the existing table function syntax to smuggle the version information in a given `sqlparser::ast::Statement`[0] - perform a walk over each incoming `sqlparser::ast::Query`, trying to see whether a table function syntax was used[1] - if it was, rename the table to something unique, and register the `TableProvider` for the specific table version in a new session context/state[2] [0] https://seafowl.io/docs/guides/querying-time-travel#querying-older-table-versions [1] https://github.com/splitgraph/seafowl/blob/main/src/version.rs#L58-L91 [2] https://github.com/splitgraph/seafowl/blob/main/src/context.rs#L542-L565 ### Additional context If this is something that is deemed to be sufficiently important for/compatible with DataFusion I'd be happy to take on the work needed to implement this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
