[
https://issues.apache.org/jira/browse/IMPALA-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939234#comment-17939234
]
ASF subversion and git services commented on IMPALA-13825:
----------------------------------------------------------
Commit e6078b42819b3642d0030992f77ff030abf2db9f in impala's branch
refs/heads/master from Laszlo Gaal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e6078b428 ]
IMPALA-13825: Extend Docker container build to custom base images
Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.
This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.
Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.
The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
If set to 'true', triggers the use of the custom image.
When set to 'false' or left unspecified, the Docker base image is
selected by the existing logic of matching the build platform's
operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image
These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.
The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.
A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.
To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:
- synchronizes every launched container's timezone to the host.
This is needed for Iceberg time-travel test, which create timestamped
Iceberg metadata items in the impalad context inside a container, but
check creation/modification times of the same items in the test scripts
running on the host, outside the containers. The tests scripts have
the implicit expectation that the same local time is shared across
all these contexts, but this is not necessarily true if the host,
where tests are running is set to a timezone other than UTC.
Time sycnhronization is achieved by injecting the TZ environment
variable into the container, holding the name of the timezone used
on the host. The timezone name is taken either from the host's TZ
variable (if set), or from the host's /etc/localtime symlink,
checking the name of the timezone file it points to.
In case /etc/localtime is not a symlink (and TZ is not set on the
host), the host's /etc/localtime file is mounted into the container.
- sets up a directory for each container to collect the Java VMs error
files (hs_err_pidNNNN.log) from the containers.
- adds the --mount_sources command line parameter, which mounts the
complete $IMPALA_HOME subtree into the container at
/opt/impala/sources to make source code available inside the container
for easier debugging.
Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
containers) using Wolfi's wolfi-base containers
Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
Reviewed-by: Jason Fehr <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Impala Docker images should be able to use custom base images
> -------------------------------------------------------------
>
> Key: IMPALA-13825
> URL: https://issues.apache.org/jira/browse/IMPALA-13825
> Project: IMPALA
> Issue Type: New Feature
> Components: Infrastructure
> Reporter: Laszlo Gaal
> Assignee: Laszlo Gaal
> Priority: Major
> Labels: build, docker
>
> Impala already supports running all daemons inside Docker containers, so it
> can be deployed in Kubernetes clusters; some downstream versions are actually
> released this way as products.
> Packaging Impala into this form factor is also supported, but the base
> container images are currently chosed implicitly by the build process, based
> on the native OS platform of the build system. This is done to ensure that
> the Impala binaries produced by the build running natively on the build
> platform can be installed into the Docker containers with no compatibility
> problems as far as library version dependencies are concerned.
> Modern security practices in cloud-based environments started requiring
> special, locked-down and slimmed-down base images for containerized
> workloads, e.g. distroless images, Ubuntu Chiselled images, or Chainguard's
> Wolfi OS. These all share the common requirement that the Impala binaries
> need to be installable into a specific, prescribed base image, possibly given
> to the build process by some external actor, so Impala's Docker build logic
> should be extended to be able to accept and accommodate such explicitly
> specified external base images.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]