Hi all,

I wanted to inform the community about a new pull request that adds
support for building Wolfi OS based Docker images as an alternative to
our current Alpine-based images.

Our current Alpine-based Docker images use the musl libc library,
which can cause compatibility issues with binaries compiled for glibc.
While we've resolved this for Pulsar binaries and IO connectors, there
are additional limitations:
- Memory allocation tuning: Alpine's malloc library lacks tuneables
and documentation for malloc optimization, making performance tuning
difficult.
  - Transparent Huge Pages (THP) support: For benchmarking and
performance optimization, it's important that the malloc library has
explicit support for THP with tuneables to adjust for the access
patterns of Pulsar workload.
- Netty memory allocation: Most of Pulsar's and BookKeeper's ([1])
memory is allocated by Netty through os::malloc. While Java heap
memory can use THP with `-XX:+TransparentHugePages`, this doesn't
cover off-heap allocations. With glibc tuneables like
`glibc.malloc.hugetlb=1`, we can enable THP for these critical
allocations when the allocated size is larger than the configured
threshold.

The pull request introduces:
- New Dockerfiles (`Dockerfile.wolfi`) for both `pulsar` and `pulsar-all` images
  - uses Wolfi OS base image (cgr.dev/chainguard/wolfi-base) instead of Alpine
- Maven build support via `-Pdocker-wolfi` profile to activate
building Wolfi based images

Wolfi OS provides several advantages over Ubuntu based options:
- Uses `apk` package management like Alpine (familiar workflow). It is
easy to keep the Wolfi based images in sync with the Alpine based
images.
- Backed by Chainguard with nightly security updates for CVEs
- Full glibc support without compatibility layers
  - Enables proper memory allocation tuning for performance testing

The main purpose at this time is to be able to do profiling where
Wolfi based images are compared to Alpine based images.
This addition doesn't replace our existing Alpine images but besides
profiling, it provides an alternative for users who need glibc
compatibility and advanced memory tuning capabilities.

The performance bottleneck that Transparent Huge Pages can address is
the bottleneck caused by TLB misses [2]. There's a real world example
in this AWS blog post:
https://aws.amazon.com/blogs/compute/using-amazon-aperf-to-go-from-50-below-to-36-above-performance-target/#:~:text=The%20big%20decrease,cycles%20to%20read.
That blog post references a useful OSS (ASL 2.0) tool by AWS, APerf
(https://github.com/aws/aperf) which can be used to find these issues.
It's also possible to enable TLB related metrics in Prometheus
node_exporter with the "perf" collector so that TLB miss related
bottlenecks can be observed.

The PR is ready for review and testing.  Please review
https://github.com/apache/pulsar/pull/24692
Since this PR is only an improvement in the build and doesn't make
changes to how we publish images, I didn't create a PIP for this
change. If we later decide to start publishing Wolfi-based images in
addition to Alpine-based images, a PIP would be made.

-Lari

1 - Side note: THP is not recommended for RocksDB since it could cause
latency spikes for RocksDB payloads. The memory allocated by RocksDB
can be excluded from THP when THP has been configured with "madvise".
This is also necessary when glibc malloc does allocations so that it
can 'madvise' the regions that are eligible for THP.
2 - https://en.wikipedia.org/wiki/Translation_lookaside_buffer#TLB-miss_handling

Reply via email to