gruuya opened a new issue, #7292:
URL: https://github.com/apache/arrow-datafusion/issues/7292

   ### Is your feature request related to a problem or challenge?
   
   A lot of DBs and table formats (e.g. Delta Lake) rely on one form or another 
of MVCC, whererby writes to a given table will result in a new table version. 
Typically, writes will usually append new files to the table state (`INSERT`) 
and/or potentially remove some files from the state (`UPDATE`/`DELETE`).
   
   A core feature of such systems is the ability to travel between different 
table versions, so that one can query some earlier (non-latest) table state. 
However, this is not currently officially supported by DataFusion, though it is 
doable in a hacky way (see below for details or 
[here](https://www.splitgraph.com/blog/seafowl-delta-storage-layer) for how it 
looks in seafowl right now). 
   
   ### Describe the solution you'd like
   
   First part of the work would be in the `sqlparser` crate, which would need 
to support the standard temporal table specifier in the form of a `AS OF` 
clause (https://en.wikipedia.org/wiki/SQL:2011)
   
   I think this should probably be captured in a new field in 
[`TableFactor::Table`](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/query.rs#L650-L664).
   
   Over at DataFusion side, besides capturing the parsed version in the 
`TableScan` logical plan, I imagine the main (breaking) change would be to 
alter the signature of the `SchemaProvider::table` method to something like
   ```rust
   async fn table(&self, name: &str, version: Option<TableVersion>) -> 
Option<Arc<dyn TableProvider>>;
   ```
   
   with `TableVersion` being some kind of an enum covering time formats or 
literal version denotations for starters. That would enable the implementer of 
this trait to know which specific table version needs to be loaded (if any).
   
   ### Describe alternatives you've considered
   
   The alternative that seafowl uses atm is the following:
   
   - abuse the existing table function syntax to smuggle the version 
information in a given `sqlparser::ast::Statement`[0]
   - perform a walk over each incoming `sqlparser::ast::Query`, trying to see 
whether a table function syntax was used[1]
   - if it was, rename the table to something unique, and register the 
`TableProvider` for the specific table version in a new session context/state[2]
   
   [0] 
https://seafowl.io/docs/guides/querying-time-travel#querying-older-table-versions
   [1] https://github.com/splitgraph/seafowl/blob/main/src/version.rs#L58-L91
   [2] https://github.com/splitgraph/seafowl/blob/main/src/context.rs#L542-L565
   
   ### Additional context
   
   If this is something that is deemed to be sufficiently important 
for/compatible with DataFusion I'd be happy to take on the work needed to 
implement this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to