[
https://issues.apache.org/jira/browse/HIVE-26151?focusedWorklogId=759150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759150
]
ASF GitHub Bot logged work on HIVE-26151:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Apr/22 12:03
Start Date: 20/Apr/22 12:03
Worklog Time Spent: 10m
Work Description: marton-bod commented on code in PR #3222:
URL: https://github.com/apache/hive/pull/3222#discussion_r854053748
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -163,4 +165,32 @@ public static void updateSpec(Configuration configuration,
Table table) {
public static boolean isBucketed(Table table) {
return table.spec().fields().stream().anyMatch(f ->
f.transform().toString().startsWith("bucket["));
}
+
+ /**
+ * Returns the snapshot ID which is immediately before (or exactly at) the
timestamp provided in millis.
+ * If the timestamp provided is before the first snapshot of the table, we
return an empty optional.
+ * If the timestamp provided is in the future compared to the latest
snapshot, we return the latest snapshot ID.
+ *
+ * E.g.: if we have snapshots S1, S2, S3 committed at times T3, T6, T9
respectively (T0 = start of epoch), then:
+ * - from T0 to T2 -> returns empty
+ * - from T3 to T5 -> returns S1
+ * - from T6 to T8 -> returns S2
+ * - from T9 to T∞ -> returns S3
+ *
+ * @param table the table whose snapshot ID we are trying to find
+ * @param time the timestamp provided in milliseconds
+ * @return the snapshot ID corresponding to the time
+ */
+ public static Optional<Long> findSnapshotForTimestamp(Table table, long
time) {
+ if (table.history().get(0).timestampMillis() > time) {
+ return Optional.empty();
+ }
+
+ for (Snapshot snapshot : table.snapshots()) {
Review Comment:
Looks like the snapshots are ordered by commit time.
Whenever there's a commit, we take the existing list of the snapshots
[here](https://github.com/apache/iceberg/blob/9618147b6de8f8627052a205b86e45263394b0c2/core/src/main/java/org/apache/iceberg/TableMetadata.java#L817),
and simply append the new snapshot to the end
[here](https://github.com/apache/iceberg/blob/9618147b6de8f8627052a205b86e45263394b0c2/core/src/main/java/org/apache/iceberg/TableMetadata.java#L994).
And since it's a List, the iteration order will be deterministic.
Issue Time Tracking
-------------------
Worklog Id: (was: 759150)
Time Spent: 1h 10m (was: 1h)
> Support range-based time travel queries for Iceberg
> ---------------------------------------------------
>
> Key: HIVE-26151
> URL: https://issues.apache.org/jira/browse/HIVE-26151
> Project: Hive
> Issue Type: New Feature
> Reporter: Marton Bod
> Assignee: Marton Bod
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Allow querying which records have been inserted during a certain time window
> for Iceberg tables. The Iceberg TableScan API provides an implementation for
> that, so most of the work would go into adding syntax support and
> transporting the startTime and endTime parameters to the Iceberg input format.
> Proposed new syntax:
> SELECT * FROM table FOR SYSTEM_TIME FROM '<startTime>' TO '<endTime>'
> SELECT * FROM table FOR SYSTEM_VERSION FROM <startVersion> TO <endVersion>
> (the TO clause is optional in both cases)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)