[
https://issues.apache.org/jira/browse/HIVE-26151?focusedWorklogId=759066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759066
]
ASF GitHub Bot logged work on HIVE-26151:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Apr/22 09:26
Start Date: 20/Apr/22 09:26
Worklog Time Spent: 10m
Work Description: marton-bod commented on code in PR #3222:
URL: https://github.com/apache/hive/pull/3222#discussion_r853924488
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -207,6 +218,39 @@ public RecordReader<Void, T> createRecordReader(InputSplit
split, TaskAttemptCon
return new IcebergRecordReader<>();
}
+ private static TableScan scanWithTimeRange(Table table, Configuration conf,
TableScan scan, long fromTime) {
+ // let's find the corresponding snapshot ID - if the fromTime is before
the table creation happened, let's use
+ // the first snapshot of the table
+ long fromSnapshot = IcebergTableUtil.findSnapshotForTimestamp(table,
fromTime)
+ .orElseGet(() -> table.history().get(0).snapshotId());
+ if (fromSnapshot == table.currentSnapshot().snapshotId()) {
+ throw new IllegalArgumentException(
+ "Provided FROM timestamp must be earlier than the latest snapshot of
the table.");
+ }
+ long toTime = conf.getLong(InputFormatConfig.TO_TIMESTAMP, -1);
Review Comment:
Sure
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -207,6 +218,39 @@ public RecordReader<Void, T> createRecordReader(InputSplit
split, TaskAttemptCon
return new IcebergRecordReader<>();
}
+ private static TableScan scanWithTimeRange(Table table, Configuration conf,
TableScan scan, long fromTime) {
+ // let's find the corresponding snapshot ID - if the fromTime is before
the table creation happened, let's use
+ // the first snapshot of the table
+ long fromSnapshot = IcebergTableUtil.findSnapshotForTimestamp(table,
fromTime)
+ .orElseGet(() -> table.history().get(0).snapshotId());
+ if (fromSnapshot == table.currentSnapshot().snapshotId()) {
+ throw new IllegalArgumentException(
+ "Provided FROM timestamp must be earlier than the latest snapshot of
the table.");
+ }
+ long toTime = conf.getLong(InputFormatConfig.TO_TIMESTAMP, -1);
+ if (toTime != -1) {
+ if (fromTime >= toTime) {
Review Comment:
Yep, makes sense
Issue Time Tracking
-------------------
Worklog Id: (was: 759066)
Time Spent: 40m (was: 0.5h)
> Support range-based time travel queries for Iceberg
> ---------------------------------------------------
>
> Key: HIVE-26151
> URL: https://issues.apache.org/jira/browse/HIVE-26151
> Project: Hive
> Issue Type: New Feature
> Reporter: Marton Bod
> Assignee: Marton Bod
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Allow querying which records have been inserted during a certain time window
> for Iceberg tables. The Iceberg TableScan API provides an implementation for
> that, so most of the work would go into adding syntax support and
> transporting the startTime and endTime parameters to the Iceberg input format.
> Proposed new syntax:
> SELECT * FROM table FOR SYSTEM_TIME FROM '<startTime>' TO '<endTime>'
> SELECT * FROM table FOR SYSTEM_VERSION FROM <startVersion> TO <endVersion>
> (the TO clause is optional in both cases)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)