Hi all, I wanted to inform the community about a new pull request that adds support for building Wolfi OS based Docker images as an alternative to our current Alpine-based images.
Our current Alpine-based Docker images use the musl libc library, which can cause compatibility issues with binaries compiled for glibc. While we've resolved this for Pulsar binaries and IO connectors, there are additional limitations: - Memory allocation tuning: Alpine's malloc library lacks tuneables and documentation for malloc optimization, making performance tuning difficult. - Transparent Huge Pages (THP) support: For benchmarking and performance optimization, it's important that the malloc library has explicit support for THP with tuneables to adjust for the access patterns of Pulsar workload. - Netty memory allocation: Most of Pulsar's and BookKeeper's ([1]) memory is allocated by Netty through os::malloc. While Java heap memory can use THP with `-XX:+TransparentHugePages`, this doesn't cover off-heap allocations. With glibc tuneables like `glibc.malloc.hugetlb=1`, we can enable THP for these critical allocations when the allocated size is larger than the configured threshold. The pull request introduces: - New Dockerfiles (`Dockerfile.wolfi`) for both `pulsar` and `pulsar-all` images - uses Wolfi OS base image (cgr.dev/chainguard/wolfi-base) instead of Alpine - Maven build support via `-Pdocker-wolfi` profile to activate building Wolfi based images Wolfi OS provides several advantages over Ubuntu based options: - Uses `apk` package management like Alpine (familiar workflow). It is easy to keep the Wolfi based images in sync with the Alpine based images. - Backed by Chainguard with nightly security updates for CVEs - Full glibc support without compatibility layers - Enables proper memory allocation tuning for performance testing The main purpose at this time is to be able to do profiling where Wolfi based images are compared to Alpine based images. This addition doesn't replace our existing Alpine images but besides profiling, it provides an alternative for users who need glibc compatibility and advanced memory tuning capabilities. The performance bottleneck that Transparent Huge Pages can address is the bottleneck caused by TLB misses [2]. There's a real world example in this AWS blog post: https://aws.amazon.com/blogs/compute/using-amazon-aperf-to-go-from-50-below-to-36-above-performance-target/#:~:text=The%20big%20decrease,cycles%20to%20read. That blog post references a useful OSS (ASL 2.0) tool by AWS, APerf (https://github.com/aws/aperf) which can be used to find these issues. It's also possible to enable TLB related metrics in Prometheus node_exporter with the "perf" collector so that TLB miss related bottlenecks can be observed. The PR is ready for review and testing. Please review https://github.com/apache/pulsar/pull/24692 Since this PR is only an improvement in the build and doesn't make changes to how we publish images, I didn't create a PIP for this change. If we later decide to start publishing Wolfi-based images in addition to Alpine-based images, a PIP would be made. -Lari 1 - Side note: THP is not recommended for RocksDB since it could cause latency spikes for RocksDB payloads. The memory allocated by RocksDB can be excluded from THP when THP has been configured with "madvise". This is also necessary when glibc malloc does allocations so that it can 'madvise' the regions that are eligible for THP. 2 - https://en.wikipedia.org/wiki/Translation_lookaside_buffer#TLB-miss_handling