This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 78054727e IMPALA-11807: Rewrite iceberg metadata if not on hdfs
78054727e is described below
commit 78054727e42d81d30d8ac9bc61c6f92bd5504f11
Author: Gergely Fürnstáhl <[email protected]>
AuthorDate: Thu Jan 19 15:08:36 2023 +0100
IMPALA-11807: Rewrite iceberg metadata if not on hdfs
Iceberg test tables are usually written on hdfs and the file paths start
with "hdfs://localhost:20500/test-warehouse".
Earlier we manually transformed the metadata so paths would start with
"/test-warehouse"
Since IMPALA-11821, testdata/bin/rewrite-iceberg-metadata.py supports
not only a custom WAREHOUSE_LOCATION_PREFIX, but the ability to trim the
beginning of the file paths.
This commit modifies the data load, so metadata rewrite always executes
if not on hdfs, even with empty WAREHOUSE_LOCATION_PREFIX.
Testing:
- Ran iceberg tests on ozone and S3
Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
Reviewed-on: http://gerrit.cloudera.org:8080/19432
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
testdata/bin/load-test-warehouse-snapshot.sh | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/testdata/bin/load-test-warehouse-snapshot.sh
b/testdata/bin/load-test-warehouse-snapshot.sh
index 11f60a237..ff73d11eb 100755
--- a/testdata/bin/load-test-warehouse-snapshot.sh
+++ b/testdata/bin/load-test-warehouse-snapshot.sh
@@ -113,10 +113,12 @@ if [ ! -f
${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/githash.txt ]; then
exit 1
fi
-if [ "${WAREHOUSE_LOCATION_PREFIX}" != "" ]; then
+if [ "${TARGET_FILESYSTEM}" != "hdfs" ]; then
+ # Need to rewrite test metadata regardless of ${WAREHOUSE_LOCATION_PREFIX}
because
+ # paths can have "hdfs://" scheme
echo "Updating Iceberg locations with warehouse prefix
${WAREHOUSE_LOCATION_PREFIX}"
- ${IMPALA_HOME}/testdata/bin/rewrite-iceberg-metadata.py
${WAREHOUSE_LOCATION_PREFIX} \
- $(find ${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/iceberg_test -name
"metadata")
+ ${IMPALA_HOME}/testdata/bin/rewrite-iceberg-metadata.py
"${WAREHOUSE_LOCATION_PREFIX}" \
+ $(find ${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/ -name "metadata")
fi
echo "Copying data to ${TARGET_FILESYSTEM}"