ted-jenks opened a new issue, #15699:
URL: https://github.com/apache/iceberg/issues/15699
### Feature Request / Improvement
Spark SQL supports point-in-time queries via `VERSION AS OF` and `TIMESTAMP
AS OF`, but there is no SQL syntax for querying a range of versions or
timestamps. This is useful for incremental consumption patterns.
The underlying scan infrastructure fully supports this. `IncrementalScan`
provides `fromSnapshotInclusive`, `fromSnapshotExclusive`, and `toSnapshot`.
There is:
```
CALL spark_catalog.system.create_changelog_view(
table => 'db.tbl',
options => map('start-snapshot-id','1','end-snapshot-id', '2')
);
SELECT * FROM tbl_changes;
```
But this does not provide a direct SQL query syntax for version ranges.
## Proposal
Add SQL syntax for version/timestamp range queries. Possible forms (open for
discussion):
```sql
-- by snapshot ID
SELECT * FROM db.table VERSION BETWEEN 1 AND 5
-- by timestamp
SELECT * FROM db.table TIMESTAMP BETWEEN '2024-01-01' AND '2024-06-01'
-- by tag/branch ref
SELECT * FROM db.table VERSION BETWEEN 'tag-a' AND 'tag-b'
```
### Query engine
Spark
### Willingness to contribute
- [ ] I can contribute this improvement/feature independently
- [x] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]