This is an automated email from the ASF dual-hosted git repository. laszlog pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit e6078b42819b3642d0030992f77ff030abf2db9f Author: Laszlo Gaal <[email protected]> AuthorDate: Wed Sep 11 19:22:23 2024 +0200 IMPALA-13825: Extend Docker container build to custom base images Downstream system vendors, users and customers have lately expressed interest in consuming Impala in containerized forms, taking advantage of various specialized, hardened container base image offerings, like container offerings based on the Wolfi project by Chainguard; see: https://github.com/wolfi-dev. This patch enables Impala container images to be built on top of custom base images, and adds an implementation example that uses the publicly available Wolfi base image. Building a customized Docker image follows a hybrid approach. Instead of replicating the complete Impala build process inside a Wolfi container for a fully native binary build, it relies on an existing build platform that is compatible with the binary packages available inside the custom container image. For Wolfi the Impala binaries are supplied by the Red Hat 9 build of Impala. This is made possible by the fact that major library dependencies of Impala have the same versions on Wolfi OS and Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi with no changes. The binaries produced by the regular build process are then installed into a Docker image built on top of an explicitly specified custom base image. The selection of a custom base image is controlled by two environment variables: - USE_CUSTOM_IMPALA_BASE_IMAGE (boolean): If set to 'true', triggers the use of the custom image. When set to 'false' or left unspecified, the Docker base image is selected by the existing logic of matching the build platform's operating system. - IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image These environment variables can be overridden from the environment, from impala-config-branch.sh, or impala-config-local.sh. They are reported at the end of bin/impala-config.sh where important environment variables are listed. They are also added to the list of variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure that they can be used in the context of Jenkins jobs as well. The unified script that installs Impala's required dependencies into the container image is extended for Wolfi to handle APK packages. A new script is added to install Bash in the Docker image if it is missing. Impala build scripts (including the scripts used during Docker image builds) as well as container startup scripts require Bash, but minimal container base images usually omit it, favoring a smaller alternative. To improve the debugging experience for a containerized Impala minicluster, the minicluster starter script bin/start-impala-cluster.py is extended with the following features: - synchronizes every launched container's timezone to the host. This is needed for Iceberg time-travel test, which create timestamped Iceberg metadata items in the impalad context inside a container, but check creation/modification times of the same items in the test scripts running on the host, outside the containers. The tests scripts have the implicit expectation that the same local time is shared across all these contexts, but this is not necessarily true if the host, where tests are running is set to a timezone other than UTC. Time sycnhronization is achieved by injecting the TZ environment variable into the container, holding the name of the timezone used on the host. The timezone name is taken either from the host's TZ variable (if set), or from the host's /etc/localtime symlink, checking the name of the timezone file it points to. In case /etc/localtime is not a symlink (and TZ is not set on the host), the host's /etc/localtime file is mounted into the container. - sets up a directory for each container to collect the Java VMs error files (hs_err_pidNNNN.log) from the containers. - adds the --mount_sources command line parameter, which mounts the complete $IMPALA_HOME subtree into the container at /opt/impala/sources to make source code available inside the container for easier debugging. Tested by running core-mode tests in the following environments: - Regular run (impalad running natively on the platform) on Ubuntu 20.04 - Regular run on Rocky Linux 9.2 - Dockerised run (impalad instances running in their individual containers) using Ubuntu 20.04 containers - Dockerised run (impalad instances running in their individual containers) using Rocky Linux 9.2 containers - Dockerised run (impalad instances running in their individual containers) using Wolfi's wolfi-base containers Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc Reviewed-on: http://gerrit.cloudera.org:8080/22583 Reviewed-by: Laszlo Gaal <[email protected]> Reviewed-by: Csaba Ringhofer <[email protected]> Reviewed-by: Jason Fehr <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- bin/impala-config.sh | 11 ++++ .../dockerized-impala-bootstrap-and-test.sh | 3 +- bin/start-impala-cluster.py | 60 ++++++++++++++++++ docker/CMakeLists.txt | 69 ++++++++++++-------- docker/daemon_entrypoint.sh | 22 ++++++- docker/docker-build.sh | 1 + docker/impala_base/Dockerfile | 10 ++- docker/impala_profile_tool/Dockerfile | 13 +++- docker/install_bash_if_needed.sh | 74 ++++++++++++++++++++++ docker/install_os_packages.sh | 65 ++++++++++++++++++- docker/setup_build_context.py | 5 ++ tests/common/impala_connection.py | 4 +- 12 files changed, 298 insertions(+), 39 deletions(-) diff --git a/bin/impala-config.sh b/bin/impala-config.sh index 689a3a069..b86c46393 100755 --- a/bin/impala-config.sh +++ b/bin/impala-config.sh @@ -297,6 +297,15 @@ export IMPALA_DATASKETCHES_VERSION=6.0.0 export IMPALA_REDHAT7_DOCKER_BASE=${IMPALA_REDHAT7_DOCKER_BASE:-"centos:centos7.9.2009"} export IMPALA_REDHAT8_DOCKER_BASE=${IMPALA_REDHAT8_DOCKER_BASE:-"rockylinux:8.5"} export IMPALA_REDHAT9_DOCKER_BASE=${IMPALA_REDHAT9_DOCKER_BASE:-"rockylinux:9.2"} +# Some users may want to use special, hardened base images for increased security. +# These images are usually not related to the OS where the build is running. +# The following environment variables allow a specific base image to be specified +# directly, without relying on the implicit build platform identification in +# CMakeLists.txt. +# Images published by Chainguard and the Wolfi project are known to be used, so the +# publicly available Wolfi base image is used as a default example. +export IMPALA_CUSTOM_DOCKER_BASE=${IMPALA_CUSTOM_DOCKER_BASE:-"cgr.dev/chainguard/wolfi-base:latest"} +export USE_CUSTOM_IMPALA_BASE_IMAGE=${USE_CUSTOM_IMPALA_BASE_IMAGE:-false} # Selects the version of Java to use when start-impala-cluster.py starts with container # images (created via e.g. 'make docker_debug_java11_images'). The Java version used in @@ -1230,6 +1239,8 @@ echo "IMPALA_SYSTEM_PYTHON2 = $IMPALA_SYSTEM_PYTHON2" echo "IMPALA_SYSTEM_PYTHON3 = $IMPALA_SYSTEM_PYTHON3" echo "IMPALA_BUILD_THREADS = $IMPALA_BUILD_THREADS" echo "NUM_CONCURRENT_TESTS = $NUM_CONCURRENT_TESTS" +echo "USE_CUSTOM_IMPALA_BASE_IMAGE = $USE_CUSTOM_IMPALA_BASE_IMAGE" +echo "IMPALA_CUSTOM_DOCKER_BASE = $IMPALA_CUSTOM_DOCKER_BASE" # Kerberos things. If the cluster exists and is kerberized, source # the required environment. This is required for any hadoop tool to diff --git a/bin/jenkins/dockerized-impala-bootstrap-and-test.sh b/bin/jenkins/dockerized-impala-bootstrap-and-test.sh index 07ab2612d..d2493de49 100755 --- a/bin/jenkins/dockerized-impala-bootstrap-and-test.sh +++ b/bin/jenkins/dockerized-impala-bootstrap-and-test.sh @@ -36,7 +36,8 @@ source ./bin/bootstrap_system.sh # to preserve additional variables. ./bin/jenkins/dockerized-impala-preserve-vars.py \ EE_TEST EE_TEST_FILES JDBC_TEST EXPLORATION_STRATEGY CMAKE_BUILD_TYPE \ - IMPALA_DOCKER_JAVA + IMPALA_DOCKER_JAVA IMPALA_CUSTOM_DOCKER_BASE USE_CUSTOM_IMPALA_BASE_IMAGE \ + IMPALA_TOOLCHAIN_HOST # Execute the tests using su to re-login so that group change made above # setup_docker takes effect. This does a full re-login and does not stay diff --git a/bin/start-impala-cluster.py b/bin/start-impala-cluster.py index 725525d5b..292604b3f 100755 --- a/bin/start-impala-cluster.py +++ b/bin/start-impala-cluster.py @@ -54,6 +54,7 @@ KUDU_MASTER_HOSTS = os.getenv("KUDU_MASTER_HOSTS", "127.0.0.1") DEFAULT_IMPALA_MAX_LOG_FILES = os.environ.get("IMPALA_MAX_LOG_FILES", 10) INTERNAL_LISTEN_HOST = os.getenv("INTERNAL_LISTEN_HOST", "localhost") TARGET_FILESYSTEM = os.getenv("TARGET_FILESYSTEM") or "hdfs" +HOST_TZ = os.getenv("TZ", None) # Options parser = OptionParser() @@ -133,6 +134,9 @@ parser.add_option("--docker_auto_ports", dest="docker_auto_ports", "(Beewax, HS2, Web UIs, etc), which avoids collisions with other " "running processes. If false, ports are mapped to the same ports " "on localhost as the non-docker impala cluster.") +parser.add_option("--mount_sources", dest="mount_sources", action="store_true", + help="Mount the $IMPALA_HOME directory as /opt/impala/sources into " + "the containers for easier debugging.") parser.add_option("--data_cache_dir", dest="data_cache_dir", default=None, help="This specifies a base directory in which the IO data cache will " "use.") @@ -254,6 +258,9 @@ def build_java_tool_options(jvm_debug_port=None): """Construct the value of the JAVA_TOOL_OPTIONS environment variable to pass to daemons.""" java_tool_options = "" + # In a Docker container the Java error file location is always fixed. + if options.docker_network is not None: + java_tool_options = "-XX:ErrorFile=/opt/impala/java-error/hs_err_pid_%p.log" if jvm_debug_port is not None: java_tool_options = ("-agentlib:jdwp=transport=dt_socket,address={debug_port}," + "server=y,suspend=n ").format(debug_port=jvm_debug_port) + java_tool_options @@ -988,6 +995,43 @@ class DockerMiniClusterOperations(object): env_args = ["-e", "HADOOP_USER_NAME={0}".format(getpass.getuser()), "-e", "JAVA_TOOL_OPTIONS={0}".format( build_java_tool_options(DEFAULT_IMPALAD_JVM_DEBUG_PORT))] + + # Calculate the timezone to pass into the container. + # Mounting /etc/localtime into the container does not work when /etc/localtime is a + # symbolic link to a real timezone file inside /usr/share/zoneinfo: Linux resolves + # the symlink before performing the bind mount, so you can't create a symlink within + # the container. + # Set the timezone by injecting the TZ environment variable with the desired timezone + # string instead. Initialize the env var from the host's TZ variable if it exists, + # or calculate the value (the timezone specifier) from the name of the timezone file + # pointed to by /etc/localtime, if it is a symlink. + # If /etc/localtime is a real file, and TZ is undefined on the host, then mount + # /etc/localtime into the container + timezone_as_env_var = True + timezone_as_mount = False + + if HOST_TZ is None: + try: + if os.path.islink("/etc/localtime"): + # This is a symlink, so figure out where it points, cut the prefix, and hope + # we'll get a timezone spec. Don't confuse realpath() and relpath() here! + timezone_string = os.path.realpath("/etc/localtime") + timezone_string = os.path.relpath(timezone_string, "/usr/share/zoneinfo") + elif os.path.isfile("/etc/localtime"): + # This is a real file, and we'll just have to mount it into the container + timezone_as_env_var = False + timezone_as_mount = True + else: + timezone_as_env_var = False + timezone_as_mount = False + LOG.warning("Unable to determine local timezone, " + "containers will user their default timezones.") + except OSError as ex: + timezone_as_env_var = False + LOG.error("Unable to map /etc/localtime to a timezone name. Reported error" + "is {0}".format(ex)) + if timezone_as_env_var: + env_args += ["-e", "TZ={0}".format(timezone_string)] # The container build processes tags the generated image with the daemon name. debug_build = options.build_type == "debug" or (options.build_type == "latest" and os.path.basename(os.path.dirname(os.readlink("be/build/latest"))) == "debug") @@ -1009,6 +1053,22 @@ class DockerMiniClusterOperations(object): if not os.path.isdir(log_dir): os.makedirs(log_dir) mount_args += ["--mount", "type=bind,src={0},dst=/opt/impala/logs".format(log_dir)] + # Collect Java error files hs_err_pidNNN.log in a unique subdirectory per daemon to + # avoid any potential interaction between containers, which should be isolated. + java_error_dir = os.path.join(IMPALA_HOME, options.log_dir, host_name, "java-error") + if not os.path.isdir(java_error_dir): + os.makedirs(java_error_dir) + mount_args += ["--mount", "type=bind,src={0},dst=/opt/impala/java-errors".format( + java_error_dir)] + # If /etc/localtime was found to be a real file instead of a symlink, then mount it + # into the container to ensure consistent clocks between the host and the Impala + # containers. This is important for logs as well as Iceberg tests. + if timezone_as_mount: + mount_args += ["--mount", + "type=bind,src=/etc/localtime,dst=/etc/localtime,readonly"] + if options.mount_sources: + mount_args += ["--mount", + "type=bind,src={0},dst=/opt/impala/sources,readonly".format(IMPALA_HOME)] # Add entries to the container's /etc/hosts file for the Docker host and the # gateway from the container to the host. These are needed for stable reverse # name resolution of the host's and the gateway's IP addresses diff --git a/docker/CMakeLists.txt b/docker/CMakeLists.txt index 4cf61c33b..e9c232eb3 100644 --- a/docker/CMakeLists.txt +++ b/docker/CMakeLists.txt @@ -32,40 +32,53 @@ cmake_host_system_information(RESULT OS_DISTRIB_VERSION_ID QUERY DISTRIB_VERSION set(QUICKSTART_BASE_IMAGE "UNSUPPORTED") set(DISTRO_BASE_IMAGE "UNSUPPORTED") +MESSAGE(STATUS "OS_DISTRIB_ID is: ${OS_DISTRIB_ID}") +MESSAGE(STATUS "OS_DISTRIB_VERSION_ID is: ${OS_DISTRIB_VERSION_ID}") + # The CMake variables are using information from /etc/os-release. # A database of /etc/os-release files is available at # https://github.com/chef/os_release # These comparisons are based on those values. -if(${OS_DISTRIB_ID} STREQUAL "ubuntu") - if(${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR - ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04" OR - ${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR - ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04") - set(DISTRO_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}") - set(QUICKSTART_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}") - endif() - if (${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR - ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04") - set(PIP "python-pip") - elseif (${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR - ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04") - set(PIP "python3-pip") - endif() -elseif(${OS_DISTRIB_ID} STREQUAL "rhel" OR - ${OS_DISTRIB_ID} STREQUAL "rocky" OR - ${OS_DISTRIB_ID} STREQUAL "almalinux" OR - ${OS_DISTRIB_ID} STREQUAL "centos") - # The Quickstart images currently don't support using a Redhat - # base image, so this doesn't set QUICKSTART_BASE_IMAGE. - if(${OS_DISTRIB_VERSION_ID} MATCHES "7.*") - set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT7_DOCKER_BASE}") - elseif(${OS_DISTRIB_VERSION_ID} MATCHES "8.*") - set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT8_DOCKER_BASE}") - elseif(${OS_DISTRIB_VERSION_ID} MATCHES "9.*") - set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT9_DOCKER_BASE}") +# A custom base image can be specified directly in the IMPALA_CUSTOM_DOCKER_BASE +# environment variable if USE_CUSTOM_IMPALA_BASE_IMAGE to set to true in the environment. +MESSAGE(STATUS "USE_CUSTOM_IMPALA_BASE_IMAGE is: $ENV{USE_CUSTOM_IMPALA_BASE_IMAGE}") +if( "$ENV{USE_CUSTOM_IMPALA_BASE_IMAGE}" STREQUAL "true" ) + set(DISTRO_BASE_IMAGE "$ENV{IMPALA_CUSTOM_DOCKER_BASE}") + # Don't publish a QuickStart image on a custom base image: it is unknown if it has + # all prerequisites, so don't set QUICKSTART_BASE_IMAGE. + MESSAGE(STATUS "Picked custom base image: ${DISTRO_BASE_IMAGE}") +else() + if(${OS_DISTRIB_ID} STREQUAL "ubuntu") + if(${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR + ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04" OR + ${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR + ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04") + set(DISTRO_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}") + set(QUICKSTART_BASE_IMAGE "ubuntu:${OS_DISTRIB_VERSION_ID}") + endif() + if (${OS_DISTRIB_VERSION_ID} STREQUAL "16.04" OR + ${OS_DISTRIB_VERSION_ID} STREQUAL "18.04") + set(PIP "python-pip") + elseif (${OS_DISTRIB_VERSION_ID} STREQUAL "20.04" OR + ${OS_DISTRIB_VERSION_ID} STREQUAL "22.04") + set(PIP "python3-pip") + endif() + elseif(${OS_DISTRIB_ID} STREQUAL "rhel" OR + ${OS_DISTRIB_ID} STREQUAL "rocky" OR + ${OS_DISTRIB_ID} STREQUAL "almalinux" OR + ${OS_DISTRIB_ID} STREQUAL "centos") + # The Quickstart images currently don't support using a Redhat + # base image, so this doesn't set QUICKSTART_BASE_IMAGE. + if(${OS_DISTRIB_VERSION_ID} MATCHES "7.*") + set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT7_DOCKER_BASE}") + elseif(${OS_DISTRIB_VERSION_ID} MATCHES "8.*") + set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT8_DOCKER_BASE}") + elseif(${OS_DISTRIB_VERSION_ID} MATCHES "9.*") + set(DISTRO_BASE_IMAGE "$ENV{IMPALA_REDHAT9_DOCKER_BASE}") + endif() endif() + MESSAGE(STATUS "Picked docker base image based on host OS: ${DISTRO_BASE_IMAGE}") endif() -MESSAGE(STATUS "Picked docker base image based on host OS: ${DISTRO_BASE_IMAGE}") if (NOT ${DISTRO_BASE_IMAGE} STREQUAL "UNSUPPORTED") diff --git a/docker/daemon_entrypoint.sh b/docker/daemon_entrypoint.sh index 643d680c5..897987816 100755 --- a/docker/daemon_entrypoint.sh +++ b/docker/daemon_entrypoint.sh @@ -36,17 +36,24 @@ DISTRIBUTION=Unknown if [[ -f /etc/redhat-release ]]; then echo "Identified Redhat image." DISTRIBUTION=Redhat -else +elif [[ -f /etc/lsb-release ]]; then source /etc/lsb-release if [[ $DISTRIB_ID == Ubuntu ]]; then echo "Identified Ubuntu image." DISTRIBUTION=Ubuntu fi +# Check /etc/os-release last: it exists on Red Hat and Ubuntu systems as well +elif [[ -f /etc/os-release ]]; then + source /etc/os-release + if [[ $ID == wolfi || $ID == chainguard ]]; then + echo "Identified Wolfi-based system." + DISTRIBUTION=Chainguard + fi fi if [[ $DISTRIBUTION == Unknown ]]; then echo "ERROR: Did not detect supported distribution." - echo "Only Ubuntu and Redhat-based distributions are supported." + echo "Only Ubuntu, Red Hat (or related), or Wolfi distributions are supported." exit 1 fi @@ -78,6 +85,17 @@ elif [[ $DISTRIBUTION == Redhat ]]; then echo "Detected Java 8" JAVA_HOME=/usr/lib/jvm/jre-1.8.0 fi +elif [[ $DISTRIBUTION == Chainguard ]]; then + if [[ -d /usr/lib/jvm/java-17-openjdk ]] ; then + echo "Detected Java 17" + JAVA_HOME=/usr/lib/jvm/java-17-openjdk + elif [[ -d /usr/lib/jvm/java-11-openjdk ]] ; then + echo "Detected Java 11" + JAVA_HOME=/usr/lib/jvm/java-11-openjdk + elif [[ -d /usr/lib/jvm/java-1.8-openjdk ]] ; then + echo "Detected Java 8" + JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk + fi fi if [[ $JAVA_HOME == Unknown ]]; then diff --git a/docker/docker-build.sh b/docker/docker-build.sh index e05eb605e..d881e441f 100755 --- a/docker/docker-build.sh +++ b/docker/docker-build.sh @@ -32,6 +32,7 @@ ARGS+=("--build-arg" 'VCS_TYPE=git') ARGS+=("--build-arg" 'VCS_URL=https://gitbox.apache.org/repos/asf/impala.git') ARGS+=("--build-arg" "VERSION=$VERSION") ARGS+=("--build-arg" "VCS_REF=$VCS_REF") +ARGS+=("--progress=plain") # Add caller-provided arguments to end. ARGS+=("$@") diff --git a/docker/impala_base/Dockerfile b/docker/impala_base/Dockerfile index 2d84d2514..463887a1c 100644 --- a/docker/impala_base/Dockerfile +++ b/docker/impala_base/Dockerfile @@ -22,8 +22,16 @@ FROM ${BASE_IMAGE} # The level of debugging tools to install is set by the argument to "--install-debug-tools". ARG INSTALL_OS_PACKAGES_ARGS="--install-debug-tools none" +# Switch to root. Needed by hardened base images that don't default to the root user. +USER root +# Install Bash, if missing from the base image: not all of them have it by default, +# but Impala scripts require it. +ADD --chown=root:root --chmod=755 helper/install_bash_if_needed.sh /root +RUN /root/install_bash_if_needed.sh + # Install minimal dependencies required for Impala services to run. -ADD helper/install_os_packages.sh /root +ADD --chown=root:root --chmod=755 helper/install_os_packages.sh /root + RUN /root/install_os_packages.sh ${INSTALL_OS_PACKAGES_ARGS} # Use a non-privileged impala user to run the daemons in the container. diff --git a/docker/impala_profile_tool/Dockerfile b/docker/impala_profile_tool/Dockerfile index 78fcae827..cd0e15ce0 100644 --- a/docker/impala_profile_tool/Dockerfile +++ b/docker/impala_profile_tool/Dockerfile @@ -22,11 +22,19 @@ FROM ${BASE_IMAGE} # If set to "--install-debug-tools full", then extra utilities will be installed. ARG INSTALL_OS_PACKAGES_ARGS="--install-debug-tools full" +# Switch to root. Needed by hardened base images that don't default to the root user. +USER root +# Install Bash, if missing from the base image: not all of them have it by default, +# but Impala scripts require it. +ADD --chown=root:root --chmod=755 helper/install_bash_if_needed.sh /root +RUN /root/install_bash_if_needed.sh + # Install dependencies required for Impala utility binaries to run, plus # some useful utilities. # TODO: ideally we wouldn't depend on the JVM libraries, but currently the JNI code # in be/ is not cleanly separated from the code that doesn't use JNI. -ADD helper/install_os_packages.sh /root +ADD --chown=root:root --chmod=755 helper/install_os_packages.sh /root + RUN /root/install_os_packages.sh ${INSTALL_OS_PACKAGES_ARGS} # Use a non-privileged impala user to run the processes in the container. @@ -43,7 +51,8 @@ COPY --chown=impala lib /opt/impala/lib WORKDIR /opt/impala/ -ENTRYPOINT ["/opt/impala/bin/utility_entrypoint.sh", "/opt/impala/bin/impala-profile-tool",\ +ENTRYPOINT ["/opt/impala/bin/utility_entrypoint.sh", \ + "/opt/impala/bin/impala-profile-tool",\ "-logtostderr"] LABEL name="Apache Impala Profile Tool" \ diff --git a/docker/install_bash_if_needed.sh b/docker/install_bash_if_needed.sh new file mode 100755 index 000000000..be0b933cc --- /dev/null +++ b/docker/install_bash_if_needed.sh @@ -0,0 +1,74 @@ +#!/bin/sh +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# This installs bash if the base Docker image does not have it. +# Bash is needed for the other helper shell scripts for Impala images, +# so the first step is to make it available. + +set -eu + +# Default level of extra debugging tools, controlled by the --install-debug-tools flag. +INSTALL_DEBUG_TOOLS=none + +DRY_RUN=false + +if command -v bash; then + echo "Bash found, skipping installation." + exit 0 +fi + +# This can get more detailed if there are specific steps +# for specific versions, but at the moment the distribution +# is all we need. +DISTRIBUTION=Unknown +if [ -f /etc/redhat-release ]; then + echo "Identified Redhat system." + DISTRIBUTION=Redhat +elif [ -f /etc/lsb-release ]; then + . /etc/lsb-release + if [ $DISTRIB_ID = Ubuntu ]; then + echo "Identified Ubuntu system." + DISTRIBUTION=Ubuntu + fi +# Check /etc/os-release last: it exists on Red Hat and Ubuntu systems as well +elif [ -f /etc/os-release ]; then + source /etc/os-release + if [ $ID = wolfi -o $ID = chainguard ]; then + echo "Identified Wolfi-based system." + DISTRIBUTION=Chainguard + fi +fi + +if [ $DISTRIBUTION = Unknown ]; then + echo "ERROR: Did not detect supported distribution." + echo "Only Ubuntu, Red Hat (or related), or Wolfi base images are supported." + exit 1 +fi + +# Install minimal set of files. +# Optionally install extra debug tools. +if [ $DISTRIBUTION = Ubuntu ]; then + export DEBIAN_FRONTEND=noninteractive + apt-get update + apt-get install -y bash +elif [ $DISTRIBUTION = Redhat ]; then + yum install -y bash +elif [ $DISTRIBUTION = Chainguard ]; then + apk add --no-cache --no-interactive bash +fi + diff --git a/docker/install_os_packages.sh b/docker/install_os_packages.sh index 4422fd579..d387e5097 100755 --- a/docker/install_os_packages.sh +++ b/docker/install_os_packages.sh @@ -29,7 +29,7 @@ INSTALL_DEBUG_TOOLS=none JAVA_VERSION=8 DRY_RUN=false PKG_LIST="" -NON_PKG_NAMES=(apt-get yum install update) +NON_PKG_NAMES=(apt-get yum apk install update add) function print_usage { echo "install_os_packages.sh - Helper script to install OS dependencies" @@ -105,7 +105,7 @@ DISTRIBUTION=Unknown if [[ -f /etc/redhat-release ]]; then echo "Identified Redhat system." DISTRIBUTION=Redhat -else +elif [[ -f /etc/lsb-release ]]; then source /etc/lsb-release if [[ $DISTRIB_ID == Ubuntu ]]; then echo "Identified Ubuntu system." @@ -118,11 +118,18 @@ else exit 1 fi fi +# Check /etc/os-release last: it exists on Red Hat and Ubuntu systems as well +elif [[ -f /etc/os-release ]]; then + source /etc/os-release + if [[ $ID == wolfi || $ID == chainguard ]]; then + echo "Identified Wolfi-based system." + DISTRIBUTION=Chainguard + fi fi if [[ $DISTRIBUTION == Unknown ]]; then echo "ERROR: Did not detect supported distribution." - echo "Only Ubuntu and Redhat-based distributions are supported." + echo "Only Ubuntu, Red Hat (or related), or Wolfi base images are supported." exit 1 fi @@ -202,6 +209,49 @@ elif [[ $DISTRIBUTION == Redhat ]]; then vim \ which fi +elif [[ $DISTRIBUTION == Chainguard ]]; then + # Package inventory: + # glibc-locale-en and posix-libc-utils: for locale and 'locale' tool support + # shadow: for groupadd and friends + wrap apk add --no-cache --no-interactive \ + glibc-locale-posix \ + glibc-locale-en \ + glibc-iconv \ + posix-libc-utils \ + localedef \ + cyrus-sasl \ + krb5-libs \ + openssl \ + openldap-dev \ + openjdk-${JAVA_VERSION}-jre \ + openjdk-${JAVA_VERSION}-default-jvm \ + shadow \ + tzdata + + # Set up /etc/localtime so that the container has a default local timezone. + # Make this UTC. + ln -sf /usr/share/zoneinfo/UTC /etc/localtime + + if [[ $INSTALL_DEBUG_TOOLS == basic || $INSTALL_DEBUG_TOOLS == full ]]; then + echo "Installing basic debug tools" + wrap apk add --no-cache --no-interactive \ + gdb \ + openjdk-${JAVA_VERSION}-default-jdk + fi + + if [[ $INSTALL_DEBUG_TOOLS == full ]]; then + echo "Installing full debug tools" + wrap apk add --no-cache --no-interactive \ + bind-tools \ + curl \ + iproute2 \ + iputils \ + less \ + nmap \ + sudo \ + tzutils \ + vim + fi fi if $DRY_RUN; then @@ -227,9 +277,16 @@ if ! command -v pgrep ; then exit 1 fi +# Java must be accessible for both the frontend and the backend. Verify it is present. +if ! command -v java; then + echo "ERROR: Java cannot be found." + exit 1 +fi + # Impala will fail to start if the permissions on /var/tmp are not set to include # the sticky bit (i.e. +t). Some versions of Redhat UBI images do not have # this set by default, so specifically set the sticky bit for both /tmp and /var/tmp. +mkdir -p /var/tmp chmod a=rwx,o+t /var/tmp /tmp # To minimize the size for the Docker image, clean up any unnecessary files. @@ -239,4 +296,6 @@ if [[ $DISTRIBUTION == Ubuntu ]]; then elif [[ $DISTRIBUTION == Redhat ]]; then yum clean all rm -rf /var/cache/yum/* +elif [[ $DISTRIBUTION == Chainguard ]]; then + rm -rf /var/cache/apk/* fi diff --git a/docker/setup_build_context.py b/docker/setup_build_context.py index b46d88043..66b5e7eb6 100755 --- a/docker/setup_build_context.py +++ b/docker/setup_build_context.py @@ -179,6 +179,11 @@ for kudu_lib_dir in kudu_lib_dirs: if not found_kudu_so: raise Exception("No Kudu shared object found in search path: {0}".format(kudu_lib_dirs)) +# Add script for installing Bash: Impala scripts need it, but some minimal base images +# may either omit shells altogether, or contain a simpler, smaller variant, e.g. sh. +symlink_file_into_dir( + os.path.join(IMPALA_HOME, "docker/install_bash_if_needed.sh"), HELPER_DIR) + # Add script for installing OS packages symlink_file_into_dir( os.path.join(IMPALA_HOME, "docker/install_os_packages.sh"), HELPER_DIR) diff --git a/tests/common/impala_connection.py b/tests/common/impala_connection.py index 18e882c32..5e1d06266 100644 --- a/tests/common/impala_connection.py +++ b/tests/common/impala_connection.py @@ -131,8 +131,8 @@ def collect_default_query_options(options, name, val, kind): return name = name.lower() val = str(val).strip('"') - if ',' in val: - # Value is a list. Wrap it with double quote. + if ',' in val or '/' in val: + # Value is a list or a timezone name containing a slash. Wrap it with double quotes. val = '"{}"'.format(val) if not val: # Value is optional with None as default.
