This is an automated email from the ASF dual-hosted git repository.

laszlog pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit e6078b42819b3642d0030992f77ff030abf2db9f
Author: Laszlo Gaal <[email protected]>
AuthorDate: Wed Sep 11 19:22:23 2024 +0200

    IMPALA-13825: Extend Docker container build to custom base images
    
    Downstream system vendors, users and customers have lately expressed
    interest in consuming Impala in containerized forms, taking advantage of
    various specialized, hardened container base image offerings, like
    container offerings based on the Wolfi project by Chainguard;
    see: https://github.com/wolfi-dev.
    
    This patch enables Impala container images to be built on top of custom
    base images, and adds an implementation example that uses the publicly
    available Wolfi base image.
    
    Building a customized Docker image follows a hybrid approach. Instead of
    replicating the complete Impala build process inside a Wolfi container
    for a fully native binary build, it relies on an existing build platform
    that is compatible with the binary packages available inside the custom
    container image. For Wolfi the Impala binaries are supplied by the
    Red Hat 9 build of Impala. This is made possible by the fact that major
    library dependencies of Impala have the same versions on Wolfi OS and
    Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
    with no changes.
    
    The binaries produced by the regular build process are then installed
    into a Docker image built on top of an explicitly specified custom base
    image. The selection of a custom base image is controlled by two
    environment variables:
    - USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
      If set to 'true', triggers the use of  the custom image.
      When set to 'false' or left unspecified, the Docker base image is
      selected by the existing logic of matching the build platform's
      operating system.
    - IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image
    
    These environment variables can be overridden from the environment,
    from impala-config-branch.sh, or impala-config-local.sh.
    They are reported at the end of bin/impala-config.sh where important
    environment variables are listed. They are also added to the list of
    variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
    that they can be used in the context of Jenkins jobs as well.
    
    The unified script that installs Impala's required dependencies into the
    container image is extended for Wolfi to handle APK packages.
    
    A new script is added to install Bash in the Docker image if it is
    missing. Impala build scripts (including the scripts used during Docker
    image builds) as well as container startup scripts require Bash,
    but minimal container base images usually omit it, favoring a smaller
    alternative.
    
    To improve the debugging experience for a containerized Impala
    minicluster, the minicluster starter script bin/start-impala-cluster.py
    is extended with the following features:
    
    - synchronizes every launched container's timezone to the host.
      This is needed for Iceberg time-travel test, which create timestamped
      Iceberg metadata items in the impalad context inside a container, but
      check creation/modification times of the same items in the test scripts
      running on the host, outside the containers. The tests scripts have
      the implicit expectation that the same local time is shared across
      all these contexts, but this is not necessarily true if the host,
      where tests are running is set to a timezone other than UTC.
    
      Time sycnhronization is achieved by injecting the TZ environment
      variable into the container, holding the name of the timezone used
      on the host. The timezone name is taken either from the host's TZ
      variable (if set), or from the host's /etc/localtime symlink,
      checking the name of the timezone file it points to.
      In case /etc/localtime is not a symlink (and TZ is not set on the
      host), the host's /etc/localtime file is mounted into the container.
    
    - sets up a directory for each container to collect the Java VMs error
      files (hs_err_pidNNNN.log) from the containers.
    
    - adds the --mount_sources command line parameter, which mounts the
      complete $IMPALA_HOME subtree into the container at
      /opt/impala/sources to make source code available inside the container
      for easier debugging.
    
    Tested by running core-mode tests in the following environments:
    - Regular run (impalad running natively on the platform) on Ubuntu 20.04
    - Regular run on Rocky Linux 9.2
    - Dockerised run (impalad instances running in their individual
      containers) using Ubuntu 20.04 containers
    - Dockerised run (impalad instances running in their individual
      containers) using Rocky Linux 9.2 containers
    - Dockerised run (impalad instances running in their individual
      containers) using Wolfi's wolfi-base containers
    
    Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
    Reviewed-on: http://gerrit.cloudera.org:8080/22583
    Reviewed-by: Laszlo Gaal <[email protected]>
    Reviewed-by: Csaba Ringhofer <[email protected]>
    Reviewed-by: Jason Fehr <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 bin/impala-config.sh                               | 11 ++++
 .../dockerized-impala-bootstrap-and-test.sh        |  3 +-
 bin/start-impala-cluster.py                        | 60 ++++++++++++++++++
 docker/CMakeLists.txt                              | 69 ++++++++++++--------
 docker/daemon_entrypoint.sh                        | 22 ++++++-
 docker/docker-build.sh                             |  1 +
 docker/impala_base/Dockerfile                      | 10 ++-
 docker/impala_profile_tool/Dockerfile              | 13 +++-
 docker/install_bash_if_needed.sh                   | 74 ++++++++++++++++++++++
 docker/install_os_packages.sh                      | 65 ++++++++++++++++++-
 docker/setup_build_context.py                      |  5 ++
 tests/common/impala_connection.py                  |  4 +-
 12 files changed, 298 insertions(+), 39 deletions(-)

diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 689a3a069..b86c46393 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -297,6 +297,15 @@ export IMPALA_DATASKETCHES_VERSION=6.0.0
 export 
IMPALA_REDHAT7_DOCKER_BASE=${IMPALA_REDHAT7_DOCKER_BASE:-"centos:centos7.9.2009"}
 export 
IMPALA_REDHAT8_DOCKER_BASE=${IMPALA_REDHAT8_DOCKER_BASE:-"rockylinux:8.5"}
 export 
IMPALA_REDHAT9_DOCKER_BASE=${IMPALA_REDHAT9_DOCKER_BASE:-"rockylinux:9.2"}
+# Some users may want to use special, hardened base images for increased 
security.
+# These images are usually not related to the OS where the build is running.
+# The following environment variables allow a specific base image to be 
specified
+# directly, without relying on the implicit build platform identification in
+# CMakeLists.txt.
+# Images published by Chainguard and the Wolfi project are known to be used, 
so the
+# publicly available Wolfi base image is used as a default example.
+export 
IMPALA_CUSTOM_DOCKER_BASE=${IMPALA_CUSTOM_DOCKER_BASE:-"cgr.dev/chainguard/wolfi-base:latest"}
+export USE_CUSTOM_IMPALA_BASE_IMAGE=${USE_CUSTOM_IMPALA_BASE_IMAGE:-false}
 
 # Selects the version of Java to use when start-impala-cluster.py starts with 
container
 # images (created via e.g. 'make docker_debug_java11_images'). The Java 
version used in
@@ -1230,6 +1239,8 @@ echo "IMPALA_SYSTEM_PYTHON2   = $IMPALA_SYSTEM_PYTHON2"
 echo "IMPALA_SYSTEM_PYTHON3   = $IMPALA_SYSTEM_PYTHON3"
 echo "IMPALA_BUILD_THREADS    = $IMPALA_BUILD_THREADS"
 echo "NUM_CONCURRENT_TESTS    = $NUM_CONCURRENT_TESTS"
+echo "USE_CUSTOM_IMPALA_BASE_IMAGE = $USE_CUSTOM_IMPALA_BASE_IMAGE"
+echo "IMPALA_CUSTOM_DOCKER_BASE    = $IMPALA_CUSTOM_DOCKER_BASE"
 
 # Kerberos things.  If the cluster exists and is kerberized, source
 # the required environment.  This is required for any hadoop tool to
diff --git a/bin/jenkins/dockerized-impala-bootstrap-and-test.sh 
b/bin/jenkins/dockerized-impala-bootstrap-and-test.sh
index 07ab2612d..d2493de49 100755
--- a/bin/jenkins/dockerized-impala-bootstrap-and-test.sh
+++ b/bin/jenkins/dockerized-impala-bootstrap-and-test.sh
@@ -36,7 +36,8 @@ source ./bin/bootstrap_system.sh
 # to preserve additional variables.
 ./bin/jenkins/dockerized-impala-preserve-vars.py \
     EE_TEST EE_TEST_FILES JDBC_TEST EXPLORATION_STRATEGY CMAKE_BUILD_TYPE \
-    IMPALA_DOCKER_JAVA
+    IMPALA_DOCKER_JAVA IMPALA_CUSTOM_DOCKER_BASE USE_CUSTOM_IMPALA_BASE_IMAGE \
+    IMPALA_TOOLCHAIN_HOST
 
 # Execute the tests using su to re-login so that group change made above
 # setup_docker takes effect. This does a full re-login and does not stay
diff --git a/bin/start-impala-cluster.py b/bin/start-impala-cluster.py
index 725525d5b..292604b3f 100755
--- a/bin/start-impala-cluster.py
+++ b/bin/start-impala-cluster.py
@@ -54,6 +54,7 @@ KUDU_MASTER_HOSTS = os.getenv("KUDU_MASTER_HOSTS", 
"127.0.0.1")
 DEFAULT_IMPALA_MAX_LOG_FILES = os.environ.get("IMPALA_MAX_LOG_FILES", 10)
 INTERNAL_LISTEN_HOST = os.getenv("INTERNAL_LISTEN_HOST", "localhost")
 TARGET_FILESYSTEM = os.getenv("TARGET_FILESYSTEM") or "hdfs"
+HOST_TZ = os.getenv("TZ", None)
 
 # Options
 parser = OptionParser()
@@ -133,6 +134,9 @@ parser.add_option("--docker_auto_ports", 
dest="docker_auto_ports",
                        "(Beewax, HS2, Web UIs, etc), which avoids collisions 
with other "
                        "running processes. If false, ports are mapped to the 
same ports "
                        "on localhost as the non-docker impala cluster.")
+parser.add_option("--mount_sources", dest="mount_sources", action="store_true",
+                  help="Mount the $IMPALA_HOME directory as 
/opt/impala/sources into "
+                       "the containers for easier debugging.")
 parser.add_option("--data_cache_dir", dest="data_cache_dir", default=None,
                   help="This specifies a base directory in which the IO data 
cache will "
                        "use.")
@@ -254,6 +258,9 @@ def build_java_tool_options(jvm_debug_port=None):
   """Construct the value of the JAVA_TOOL_OPTIONS environment variable to pass 
to
   daemons."""
   java_tool_options = ""
+  # In a Docker container the Java error file location is always fixed.
+  if options.docker_network is not None:
+    java_tool_options = 
"-XX:ErrorFile=/opt/impala/java-error/hs_err_pid_%p.log"
   if jvm_debug_port is not None:
     java_tool_options = 
("-agentlib:jdwp=transport=dt_socket,address={debug_port}," +
         "server=y,suspend=n ").format(debug_port=jvm_debug_port) + 
java_tool_options
@@ -988,6 +995,43 @@ class DockerMiniClusterOperations(object):
     env_args = ["-e", "HADOOP_USER_NAME={0}".format(getpass.getuser()),
                 "-e", "JAVA_TOOL_OPTIONS={0}".format(
                     build_java_tool_options(DEFAULT_IMPALAD_JVM_DEBUG_PORT))]
+
+    # Calculate the timezone to pass into the container.
+    # Mounting /etc/localtime into the container does not work when 
/etc/localtime is a
+    # symbolic link to a real timezone file inside /usr/share/zoneinfo: Linux 
resolves
+    # the symlink before performing the bind mount, so you can't create a 
symlink within
+    # the container.
+    # Set the timezone by injecting the TZ environment variable with the 
desired timezone
+    # string instead. Initialize the env var from the host's TZ variable if it 
exists,
+    # or calculate the value (the timezone specifier) from the name of the 
timezone file
+    # pointed to by /etc/localtime, if it is a symlink.
+    # If /etc/localtime is a real file, and TZ is undefined on the host, then 
mount
+    # /etc/localtime into the container
+    timezone_as_env_var = True
+    timezone_as_mount = False
+
+    if HOST_TZ is None:
+      try:
+        if os.path.islink("/etc/localtime"):
+          # This is a symlink, so figure out where it points, cut the prefix, 
and hope
+          # we'll get a timezone spec. Don't confuse realpath() and relpath() 
here!
+          timezone_string = os.path.realpath("/etc/localtime")
+          timezone_string = os.path.relpath(timezone_string, 
"/usr/share/zoneinfo")
+        elif os.path.isfile("/etc/localtime"):
+          # This is a real file, and we'll just have to mount it into the 
container
+          timezone_as_env_var = False
+          timezone_as_mount = True
+        else:
+          timezone_as_env_var = False
+          timezone_as_mount = False
+          LOG.warning("Unable to determine local timezone, "
+              "containers will user their default timezones.")
+      except OSError as ex:
+        timezone_as_env_var = False
+        LOG.error("Unable to map /etc/localtime to a timezone name. Reported 
error"
+                  "is {0}".format(ex))
+    if timezone_as_env_var:
+      env_args += ["-e", "TZ={0}".format(timezone_string)]
     # The container build processes tags the generated image with the daemon 
name.
     debug_build = options.build_type == "debug" or (options.build_type == 
"latest" and
         os.path.basename(os.path.dirname(os.readlink("be/build/latest"))) == 
"debug")
@@ -1009,6 +1053,22 @@ class DockerMiniClusterOperations(object):
     if not os.path.isdir(log_dir):
       os.makedirs(log_dir)
     mount_args += ["--mount", 
"type=bind,src={0},dst=/opt/impala/logs".format(log_dir)]
+    # Collect Java error files hs_err_pidNNN.log in a unique subdirectory per 
daemon to
+    # avoid any potential interaction between containers, which should be 
isolated.
+    java_error_dir = os.path.join(IMPALA_HOME, options.log_dir, host_name, 
"java-error")
+    if not os.path.isdir(java_error_dir):
+      os.makedirs(java_error_dir)
+    mount_args += ["--mount", 
"type=bind,src={0},dst=/opt/impala/java-errors".format(
+        java_error_dir)]
+    # If /etc/localtime was found to be a real file instead of a symlink, then 
mount it
+    # into the container to ensure consistent clocks between the host and the 
Impala
+    # containers. This is important for logs as well as Iceberg tests.
+    if timezone_as_mount:
+      mount_args += ["--mount",
+          "type=bind,src=/etc/localtime,dst=/etc/localtime,readonly"]
+    if options.mount_sources:
+      mount_args += ["--mount",
+          
"type=bind,src={0},dst=/opt/impala/sources,readonly".format(IMPALA_HOME)]
     # Add entries to the container's /etc/hosts file for the Docker host and 
the
     # gateway from the container to the host. These are needed for stable 
reverse
     # name resolution of the host's and the gateway's IP addresses
diff --git a/docker/CMakeLists.txt b/docker/CMakeLists.txt
index 4cf61c33b..e9c232eb3 100644
--- a/docker/CMakeLists.txt
+++ b/docker/CMakeLists.txt
@@ -32,40 +32,53 @@ cmake_host_system_information(RESULT OS_DISTRIB_VERSION_ID 
QUERY DISTRIB_VERSION
 set(QUICKSTART_BASE_IMAGE "UNSUPPORTED")
 set(DISTRO_BASE_IMAGE "UNSUPPORTED")
 
+MESSAGE(STATUS "OS_DISTRIB_ID is: ${OS_DISTRIB_ID}")
+MESSAGE(STATUS "OS_DISTRIB_VERSION_ID is: ${OS_DISTRIB_VERSION_ID}")
+
 # The CMake variables are using information from /etc/os-release.
 # A database of /etc/os-release files is available at
 # https://github.com/chef/os_release
 # These comparisons are based on those values.
-if(${OS_DISTRIB_ID} STREQUAL "ubuntu")
-  if(${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR
-     ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04" OR
-     ${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR
-     ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04")
-    set(DISTRO_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}")
-    set(QUICKSTART_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}")
-  endif()
-  if (${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR
-      ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04")
-    set(PIP "python-pip")
-  elseif (${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR
-          ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04")
-    set(PIP "python3-pip")
-  endif()
-elseif(${OS_DISTRIB_ID} STREQUAL "rhel" OR
-       ${OS_DISTRIB_ID} STREQUAL "rocky" OR
-       ${OS_DISTRIB_ID} STREQUAL "almalinux" OR
-       ${OS_DISTRIB_ID} STREQUAL "centos")
-  # The Quickstart images currently don't support using a Redhat
-  # base image, so this doesn't set QUICKSTART_BASE_IMAGE.
-  if(${OS_DISTRIB_VERSION_ID} MATCHES "7.*")
-    set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT7_DOCKER_BASE}")
-  elseif(${OS_DISTRIB_VERSION_ID} MATCHES "8.*")
-    set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT8_DOCKER_BASE}")
-  elseif(${OS_DISTRIB_VERSION_ID} MATCHES "9.*")
-    set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT9_DOCKER_BASE}")
+# A custom base image can be specified directly in the 
IMPALA_CUSTOM_DOCKER_BASE
+# environment variable if USE_CUSTOM_IMPALA_BASE_IMAGE to set to true in the 
environment.
+MESSAGE(STATUS "USE_CUSTOM_IMPALA_BASE_IMAGE is: 
$ENV{USE_CUSTOM_IMPALA_BASE_IMAGE}")
+if( "$ENV{USE_CUSTOM_IMPALA_BASE_IMAGE}" STREQUAL "true" )
+  set(DISTRO_BASE_IMAGE "$ENV{IMPALA_CUSTOM_DOCKER_BASE}")
+  # Don't publish a QuickStart image on a custom base image: it is unknown if 
it has
+  # all prerequisites, so don't set QUICKSTART_BASE_IMAGE.
+  MESSAGE(STATUS "Picked custom base image: ${DISTRO_BASE_IMAGE}")
+else()
+  if(${OS_DISTRIB_ID} STREQUAL "ubuntu")
+    if(${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR
+       ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04" OR
+       ${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR
+       ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04")
+      set(DISTRO_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}")
+      set(QUICKSTART_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}")
+    endif()
+    if (${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR
+        ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04")
+      set(PIP "python-pip")
+    elseif (${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR
+            ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04")
+      set(PIP "python3-pip")
+    endif()
+  elseif(${OS_DISTRIB_ID} STREQUAL "rhel" OR
+         ${OS_DISTRIB_ID} STREQUAL "rocky" OR
+         ${OS_DISTRIB_ID} STREQUAL "almalinux" OR
+         ${OS_DISTRIB_ID} STREQUAL "centos")
+    # The Quickstart images currently don't support using a Redhat
+    # base image, so this doesn't set QUICKSTART_BASE_IMAGE.
+    if(${OS_DISTRIB_VERSION_ID} MATCHES "7.*")
+      set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT7_DOCKER_BASE}")
+    elseif(${OS_DISTRIB_VERSION_ID} MATCHES "8.*")
+      set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT8_DOCKER_BASE}")
+    elseif(${OS_DISTRIB_VERSION_ID} MATCHES "9.*")
+      set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT9_DOCKER_BASE}")
+    endif()
   endif()
+  MESSAGE(STATUS "Picked docker base image based on host OS: 
${DISTRO_BASE_IMAGE}")
 endif()
-MESSAGE(STATUS "Picked docker base image based on host OS: 
${DISTRO_BASE_IMAGE}")
 
 
 if (NOT ${DISTRO_BASE_IMAGE} STREQUAL "UNSUPPORTED")
diff --git a/docker/daemon_entrypoint.sh b/docker/daemon_entrypoint.sh
index 643d680c5..897987816 100755
--- a/docker/daemon_entrypoint.sh
+++ b/docker/daemon_entrypoint.sh
@@ -36,17 +36,24 @@ DISTRIBUTION=Unknown
 if [[ -f /etc/redhat-release ]]; then
   echo "Identified Redhat image."
   DISTRIBUTION=Redhat
-else
+elif [[ -f /etc/lsb-release ]]; then
   source /etc/lsb-release
   if [[ $DISTRIB_ID == Ubuntu ]]; then
     echo "Identified Ubuntu image."
     DISTRIBUTION=Ubuntu
   fi
+# Check /etc/os-release last: it exists on Red Hat and Ubuntu systems as well
+elif [[ -f /etc/os-release ]]; then
+  source /etc/os-release
+  if [[ $ID == wolfi || $ID == chainguard ]]; then
+    echo "Identified Wolfi-based system."
+    DISTRIBUTION=Chainguard
+  fi
 fi
 
 if [[ $DISTRIBUTION == Unknown ]]; then
   echo "ERROR: Did not detect supported distribution."
-  echo "Only Ubuntu and Redhat-based distributions are supported."
+  echo "Only Ubuntu, Red Hat (or related), or Wolfi distributions are 
supported."
   exit 1
 fi
 
@@ -78,6 +85,17 @@ elif [[ $DISTRIBUTION == Redhat ]]; then
     echo "Detected Java 8"
     JAVA_HOME=/usr/lib/jvm/jre-1.8.0
   fi
+elif [[ $DISTRIBUTION == Chainguard ]]; then
+  if [[ -d /usr/lib/jvm/java-17-openjdk ]] ; then
+    echo "Detected Java 17"
+    JAVA_HOME=/usr/lib/jvm/java-17-openjdk
+  elif [[ -d /usr/lib/jvm/java-11-openjdk ]] ; then
+    echo "Detected Java 11"
+    JAVA_HOME=/usr/lib/jvm/java-11-openjdk
+  elif [[ -d /usr/lib/jvm/java-1.8-openjdk ]] ; then
+    echo "Detected Java 8"
+    JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
+  fi
 fi
 
 if [[ $JAVA_HOME == Unknown ]]; then
diff --git a/docker/docker-build.sh b/docker/docker-build.sh
index e05eb605e..d881e441f 100755
--- a/docker/docker-build.sh
+++ b/docker/docker-build.sh
@@ -32,6 +32,7 @@ ARGS+=("--build-arg" 'VCS_TYPE=git')
 ARGS+=("--build-arg" 'VCS_URL=https://gitbox.apache.org/repos/asf/impala.git')
 ARGS+=("--build-arg" "VERSION=$VERSION")
 ARGS+=("--build-arg" "VCS_REF=$VCS_REF")
+ARGS+=("--progress=plain")
 
 # Add caller-provided arguments to end.
 ARGS+=("$@")
diff --git a/docker/impala_base/Dockerfile b/docker/impala_base/Dockerfile
index 2d84d2514..463887a1c 100644
--- a/docker/impala_base/Dockerfile
+++ b/docker/impala_base/Dockerfile
@@ -22,8 +22,16 @@ FROM ${BASE_IMAGE}
 # The level of debugging tools to install is set by the argument to  
"--install-debug-tools".
 ARG INSTALL_OS_PACKAGES_ARGS="--install-debug-tools none"
 
+# Switch to root. Needed by hardened base images that don't default to the 
root user.
+USER root
+# Install Bash, if missing from the base image: not all of them have it by 
default,
+# but Impala scripts require it.
+ADD --chown=root:root --chmod=755 helper/install_bash_if_needed.sh /root
+RUN /root/install_bash_if_needed.sh
+
 # Install minimal dependencies required for Impala services to run.
-ADD helper/install_os_packages.sh /root
+ADD --chown=root:root --chmod=755 helper/install_os_packages.sh /root
+
 RUN /root/install_os_packages.sh ${INSTALL_OS_PACKAGES_ARGS}
 
 # Use a non-privileged impala user to run the daemons in the container.
diff --git a/docker/impala_profile_tool/Dockerfile 
b/docker/impala_profile_tool/Dockerfile
index 78fcae827..cd0e15ce0 100644
--- a/docker/impala_profile_tool/Dockerfile
+++ b/docker/impala_profile_tool/Dockerfile
@@ -22,11 +22,19 @@ FROM ${BASE_IMAGE}
 # If set to "--install-debug-tools full", then extra utilities will be 
installed.
 ARG INSTALL_OS_PACKAGES_ARGS="--install-debug-tools full"
 
+# Switch to root. Needed by hardened base images that don't default to the 
root user.
+USER root
+# Install Bash, if missing from the base image: not all of them have it by 
default,
+# but Impala scripts require it.
+ADD --chown=root:root --chmod=755 helper/install_bash_if_needed.sh /root
+RUN /root/install_bash_if_needed.sh
+
 # Install dependencies required for Impala utility binaries to run, plus
 # some useful utilities.
 # TODO: ideally we wouldn't depend on the JVM libraries, but currently the JNI 
code
 # in be/ is not cleanly separated from the code that doesn't use JNI.
-ADD helper/install_os_packages.sh /root
+ADD --chown=root:root --chmod=755 helper/install_os_packages.sh /root
+
 RUN /root/install_os_packages.sh ${INSTALL_OS_PACKAGES_ARGS}
 
 # Use a non-privileged impala user to run the processes in the container.
@@ -43,7 +51,8 @@ COPY --chown=impala lib /opt/impala/lib
 
 WORKDIR /opt/impala/
 
-ENTRYPOINT ["/opt/impala/bin/utility_entrypoint.sh", 
"/opt/impala/bin/impala-profile-tool",\
+ENTRYPOINT ["/opt/impala/bin/utility_entrypoint.sh", \
+     "/opt/impala/bin/impala-profile-tool",\
      "-logtostderr"]
 
 LABEL name="Apache Impala Profile Tool" \
diff --git a/docker/install_bash_if_needed.sh b/docker/install_bash_if_needed.sh
new file mode 100755
index 000000000..be0b933cc
--- /dev/null
+++ b/docker/install_bash_if_needed.sh
@@ -0,0 +1,74 @@
+#!/bin/sh
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# This installs bash if the base Docker image does not have it.
+# Bash is needed for the other helper shell scripts for Impala images,
+# so the first step is to make it available.
+
+set -eu
+
+# Default level of extra debugging tools, controlled by the 
--install-debug-tools flag.
+INSTALL_DEBUG_TOOLS=none
+
+DRY_RUN=false
+
+if command -v bash; then
+  echo "Bash found, skipping installation."
+  exit 0
+fi
+
+# This can get more detailed if there are specific steps
+# for specific versions, but at the moment the distribution
+# is all we need.
+DISTRIBUTION=Unknown
+if [ -f /etc/redhat-release ]; then
+  echo "Identified Redhat system."
+  DISTRIBUTION=Redhat
+elif [ -f /etc/lsb-release ]; then
+  . /etc/lsb-release
+  if [ $DISTRIB_ID = Ubuntu ]; then
+    echo "Identified Ubuntu system."
+    DISTRIBUTION=Ubuntu
+  fi
+# Check /etc/os-release last: it exists on Red Hat and Ubuntu systems as well
+elif [ -f /etc/os-release ]; then
+  source /etc/os-release
+  if [ $ID = wolfi -o $ID = chainguard ]; then
+    echo "Identified Wolfi-based system."
+    DISTRIBUTION=Chainguard
+  fi
+fi
+
+if [ $DISTRIBUTION = Unknown ]; then
+  echo "ERROR: Did not detect supported distribution."
+  echo "Only Ubuntu, Red Hat (or related), or Wolfi base images are supported."
+  exit 1
+fi
+
+# Install minimal set of files.
+# Optionally install extra debug tools.
+if [ $DISTRIBUTION = Ubuntu ]; then
+  export DEBIAN_FRONTEND=noninteractive
+  apt-get update
+  apt-get install -y bash
+elif [ $DISTRIBUTION = Redhat ]; then
+  yum install -y bash
+elif [ $DISTRIBUTION = Chainguard ]; then
+  apk add --no-cache --no-interactive bash
+fi
+
diff --git a/docker/install_os_packages.sh b/docker/install_os_packages.sh
index 4422fd579..d387e5097 100755
--- a/docker/install_os_packages.sh
+++ b/docker/install_os_packages.sh
@@ -29,7 +29,7 @@ INSTALL_DEBUG_TOOLS=none
 JAVA_VERSION=8
 DRY_RUN=false
 PKG_LIST=""
-NON_PKG_NAMES=(apt-get yum install update)
+NON_PKG_NAMES=(apt-get yum apk install update add)
 
 function print_usage {
     echo "install_os_packages.sh - Helper script to install OS dependencies"
@@ -105,7 +105,7 @@ DISTRIBUTION=Unknown
 if [[ -f /etc/redhat-release ]]; then
   echo "Identified Redhat system."
   DISTRIBUTION=Redhat
-else
+elif [[ -f /etc/lsb-release ]]; then
   source /etc/lsb-release
   if [[ $DISTRIB_ID == Ubuntu ]]; then
     echo "Identified Ubuntu system."
@@ -118,11 +118,18 @@ else
       exit 1
     fi
   fi
+# Check /etc/os-release last: it exists on Red Hat and Ubuntu systems as well
+elif [[ -f /etc/os-release ]]; then
+  source /etc/os-release
+  if [[ $ID == wolfi || $ID == chainguard ]]; then
+    echo "Identified Wolfi-based system."
+    DISTRIBUTION=Chainguard
+  fi
 fi
 
 if [[ $DISTRIBUTION == Unknown ]]; then
   echo "ERROR: Did not detect supported distribution."
-  echo "Only Ubuntu and Redhat-based distributions are supported."
+  echo "Only Ubuntu, Red Hat (or related), or Wolfi base images are supported."
   exit 1
 fi
 
@@ -202,6 +209,49 @@ elif [[ $DISTRIBUTION == Redhat ]]; then
         vim \
         which
   fi
+elif [[ $DISTRIBUTION == Chainguard ]]; then
+  # Package inventory:
+  # glibc-locale-en and posix-libc-utils: for locale and 'locale' tool support
+  # shadow: for groupadd and friends
+  wrap apk add --no-cache --no-interactive \
+    glibc-locale-posix \
+    glibc-locale-en \
+    glibc-iconv \
+    posix-libc-utils \
+    localedef \
+    cyrus-sasl \
+    krb5-libs \
+    openssl \
+    openldap-dev \
+    openjdk-${JAVA_VERSION}-jre \
+    openjdk-${JAVA_VERSION}-default-jvm \
+    shadow \
+    tzdata
+
+  # Set up /etc/localtime so that the container has a default local timezone.
+  # Make this UTC.
+  ln -sf /usr/share/zoneinfo/UTC /etc/localtime
+
+  if [[ $INSTALL_DEBUG_TOOLS == basic || $INSTALL_DEBUG_TOOLS == full ]]; then
+    echo "Installing basic debug tools"
+    wrap apk add --no-cache --no-interactive \
+        gdb \
+        openjdk-${JAVA_VERSION}-default-jdk
+  fi
+
+  if [[ $INSTALL_DEBUG_TOOLS == full ]]; then
+    echo "Installing full debug tools"
+    wrap apk add --no-cache --no-interactive \
+        bind-tools \
+        curl \
+        iproute2 \
+        iputils \
+        less \
+        nmap \
+        sudo \
+        tzutils \
+        vim
+  fi
 fi
 
 if $DRY_RUN; then
@@ -227,9 +277,16 @@ if ! command -v pgrep ; then
   exit 1
 fi
 
+# Java must be accessible for both the frontend and the backend. Verify it is 
present.
+if ! command -v java; then
+  echo "ERROR: Java cannot be found."
+  exit 1
+fi
+
 # Impala will fail to start if the permissions on /var/tmp are not set to 
include
 # the sticky bit (i.e. +t). Some versions of Redhat UBI images do not have
 # this set by default, so specifically set the sticky bit for both /tmp and 
/var/tmp.
+mkdir -p /var/tmp
 chmod a=rwx,o+t /var/tmp /tmp
 
 # To minimize the size for the Docker image, clean up any unnecessary files.
@@ -239,4 +296,6 @@ if [[ $DISTRIBUTION == Ubuntu ]]; then
 elif [[ $DISTRIBUTION == Redhat ]]; then
   yum clean all
   rm -rf /var/cache/yum/*
+elif [[ $DISTRIBUTION == Chainguard ]]; then
+  rm -rf /var/cache/apk/*
 fi
diff --git a/docker/setup_build_context.py b/docker/setup_build_context.py
index b46d88043..66b5e7eb6 100755
--- a/docker/setup_build_context.py
+++ b/docker/setup_build_context.py
@@ -179,6 +179,11 @@ for kudu_lib_dir in kudu_lib_dirs:
 if not found_kudu_so:
   raise Exception("No Kudu shared object found in search path: 
{0}".format(kudu_lib_dirs))
 
+# Add script for installing Bash: Impala scripts need it, but some minimal 
base images
+# may either omit shells altogether, or contain a simpler, smaller variant, 
e.g. sh.
+symlink_file_into_dir(
+    os.path.join(IMPALA_HOME, "docker/install_bash_if_needed.sh"), HELPER_DIR)
+
 # Add script for installing OS packages
 symlink_file_into_dir(
     os.path.join(IMPALA_HOME, "docker/install_os_packages.sh"), HELPER_DIR)
diff --git a/tests/common/impala_connection.py 
b/tests/common/impala_connection.py
index 18e882c32..5e1d06266 100644
--- a/tests/common/impala_connection.py
+++ b/tests/common/impala_connection.py
@@ -131,8 +131,8 @@ def collect_default_query_options(options, name, val, kind):
     return
   name = name.lower()
   val = str(val).strip('"')
-  if ',' in val:
-    # Value is a list. Wrap it with double quote.
+  if ',' in val or '/' in val:
+    # Value is a list or a timezone name containing a slash. Wrap it with 
double quotes.
     val = '"{}"'.format(val)
   if not val:
     # Value is optional with None as default.

Reply via email to