This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 78054727e IMPALA-11807: Rewrite iceberg metadata if not on hdfs
78054727e is described below

commit 78054727e42d81d30d8ac9bc61c6f92bd5504f11
Author: Gergely Fürnstáhl <[email protected]>
AuthorDate: Thu Jan 19 15:08:36 2023 +0100

    IMPALA-11807: Rewrite iceberg metadata if not on hdfs
    
    Iceberg test tables are usually written on hdfs and the file paths start
    with "hdfs://localhost:20500/test-warehouse".
    
    Earlier we manually transformed the metadata so paths would start with
    "/test-warehouse"
    
    Since IMPALA-11821, testdata/bin/rewrite-iceberg-metadata.py supports
    not only a custom WAREHOUSE_LOCATION_PREFIX, but the ability to trim the
    beginning of the file paths.
    
    This commit modifies the data load, so metadata rewrite always executes
    if not on hdfs, even with empty WAREHOUSE_LOCATION_PREFIX.
    
    Testing:
      - Ran iceberg tests on ozone and S3
    
    Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
    Reviewed-on: http://gerrit.cloudera.org:8080/19432
    Reviewed-by: Michael Smith <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 testdata/bin/load-test-warehouse-snapshot.sh | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/testdata/bin/load-test-warehouse-snapshot.sh 
b/testdata/bin/load-test-warehouse-snapshot.sh
index 11f60a237..ff73d11eb 100755
--- a/testdata/bin/load-test-warehouse-snapshot.sh
+++ b/testdata/bin/load-test-warehouse-snapshot.sh
@@ -113,10 +113,12 @@ if [ ! -f 
${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/githash.txt ]; then
   exit 1
 fi
 
-if [ "${WAREHOUSE_LOCATION_PREFIX}" != "" ]; then
+if [ "${TARGET_FILESYSTEM}" != "hdfs" ]; then
+  # Need to rewrite test metadata regardless of ${WAREHOUSE_LOCATION_PREFIX} 
because
+  # paths can have "hdfs://" scheme
   echo "Updating Iceberg locations with warehouse prefix 
${WAREHOUSE_LOCATION_PREFIX}"
-  ${IMPALA_HOME}/testdata/bin/rewrite-iceberg-metadata.py 
${WAREHOUSE_LOCATION_PREFIX} \
-      $(find ${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/iceberg_test -name 
"metadata")
+  ${IMPALA_HOME}/testdata/bin/rewrite-iceberg-metadata.py 
"${WAREHOUSE_LOCATION_PREFIX}" \
+      $(find ${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/ -name "metadata")
 fi
 
 echo "Copying data to ${TARGET_FILESYSTEM}"

Reply via email to