This is an automated email from the ASF dual-hosted git repository.
asherman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 06eb62d3e IMPALA-12197: Prevent assertion failures when
isClusteringColumn() is called on a IcebergTimeTravelTable.
06eb62d3e is described below
commit 06eb62d3efa1c94810c4276f90896fa62205a49b
Author: Andrew Sherman <[email protected]>
AuthorDate: Thu Jun 8 14:27:00 2023 -0700
IMPALA-12197: Prevent assertion failures when isClusteringColumn() is
called on a IcebergTimeTravelTable.
When using local catalog mode, if a runtime filter is being generated
for a time travel iceberg table, then a query may fail with "ERROR:
IllegalArgumentException: null"
In the planner an Iceberg table that is being accessed with Time Travel
is represented by an IcebergTimeTravelTable object. This object
represents a time-based variation on a base table. The
IcebergTimeTravelTable may represent a different schema from the base
table, it does this by tracking its own set of Columns. As part of
generating a runtime filter the isClusteringColumn() method is called
on the table. IcebergTimeTravelTable was delegating this call to the
base object. In local catalog mode this method is implemented by
LocalTable which has a Preconditions check (an assertion) that the
column parameter matches the stored column. In this case the check
fails as the base table and time travel table have their own distinct
set of column objects.
The fix is to have IcebergTimeTravelTable provide its own
isClusteringColumn() method. For iceberg there are no clustering
columns, so this method simply returns false.
TESTING
- Ran all end-to-end tests.
- Added test case for query that failed.
Change-Id: I51d04c8757fb48bd417248492d4615ac58085632
Reviewed-on: http://gerrit.cloudera.org:8080/20034
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
.../org/apache/impala/catalog/IcebergTimeTravelTable.java | 9 +++++++--
.../queries/QueryTest/iceberg-time-travel.test | 15 +++++++++++++++
tests/query_test/test_iceberg.py | 3 +++
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git
a/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
b/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
index 381c929b6..521ac75fa 100644
--- a/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
@@ -75,8 +75,7 @@ public class IcebergTimeTravelTable
// The Time Travel parameters that control the schema for the table.
private final TimeTravelSpec timeTravelSpec_;
- // colsByPos[i] refers to the ith column in the table. The first
numClusteringCols are
- // the clustering columns.
+ // colsByPos[i] refers to the ith column in the table.
protected final ArrayList<Column> colsByPos_ = new ArrayList<>();
// map from lowercase column name to Column object.
@@ -156,6 +155,12 @@ public class IcebergTimeTravelTable
return colsByPos_;
}
+ @Override
+ public boolean isClusteringColumn(Column c) {
+ Preconditions.checkArgument(colsByPos_.get(c.getPosition()) == c);
+ return false;
+ }
+
@Override
public TTableDescriptor toThriftDescriptor(
int tableId, Set<Long> referencedPartitions) {
diff --git
a/testdata/workloads/functional-query/queries/QueryTest/iceberg-time-travel.test
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-time-travel.test
new file mode 100644
index 000000000..6518abf08
--- /dev/null
+++
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-time-travel.test
@@ -0,0 +1,15 @@
+====
+---- QUERY
+# Time travel query that tickles bug IMPALA-12197.
+create table iceberg_flights (uniquecarrier string) partitioned by (year int)
stored as iceberg;
+create table iceberg_airlines (code string) stored as iceberg;
+insert into iceberg_flights(uniquecarrier, year) values('ba', 1966);
+insert into iceberg_airlines(code) values('ba');
+WITH dist_flights AS
+( SELECT DISTINCT f1.uniquecarrier AS carrier FROM iceberg_flights FOR
SYSTEM_TIME AS OF '2040-12-31 00:00:00.000' f1)
+SELECT * FROM dist_flights JOIN iceberg_airlines a ON dist_flights.carrier =
a.code;
+---- RESULTS
+'ba','ba'
+---- TYPES
+STRING,STRING
+====
diff --git a/tests/query_test/test_iceberg.py b/tests/query_test/test_iceberg.py
index e8c2ed522..b5b501965 100644
--- a/tests/query_test/test_iceberg.py
+++ b/tests/query_test/test_iceberg.py
@@ -641,6 +641,9 @@ class TestIcebergTable(IcebergTestSuite):
except Exception as e:
assert "Cannot find a snapshot older than" in str(e)
+ def test_time_travel_queries(self, vector, unique_database):
+ self.run_test_case('QueryTest/iceberg-time-travel', vector,
use_db=unique_database)
+
@SkipIf.not_dfs
def test_strings_utf8(self, vector, unique_database):
# Create table