Re: [DISCUSS] PIP-467: Adopt slog for structured logging across Pulsar

PengHui Li Fri, 03 Apr 2026 17:35:19 -0700

+1

Penghui


On Fri, Apr 3, 2026 at 10:38 AM Matteo Merli <[email protected]> wrote:

> https://github.com/apache/pulsar/pull/25462
>
>
> # Background knowledge
>
> [PIP-89](https://github.com/apache/pulsar/blob/master/pip/pip-89.md)
> introduced the concept of structured event
> logging for Pulsar and was approved in 2021. It proposed replacing SLF4J
> string-interpolated logging with structured
> key-value logging, using a custom wrapper (`structured-event-log` module)
> built on top of SLF4J's MDC.
>
> Since then, a standalone library called [slog](
> https://github.com/merlimat/slog) has been developed that provides a
> more complete and performant implementation of structured logging. slog
> builds on the same principles as PIP-89 but
> with a significantly improved API, better performance characteristics, and
> support for all standard log levels.
>
> ### Structured logging
>
> Traditional logging uses string interpolation to embed values into log
> messages:
>
> ```java
> log.info("[{}] Opening ledger {} with {} entries", name, ledgerId,
> entries);
> ```
>
> Structured logging instead attaches context as typed key-value attributes:
>
> ```java
> log.info().attr("ledgerId", ledgerId).attr("entries",
> entries).log("Opening
> ledger");
> ```
>
> This makes log records machine-parseable, filterable, and queryable without
> complex regex patterns.
>
> ### slog API
>
> slog provides a `Logger` interface with the following key features:
>
> * **All standard levels**: `trace()`, `debug()`, `info()`, `warn()`,
> `error()` — each with four variants:
>   simple message, printf-formatted, fluent builder, and
> lambda/consumer-based.
> * **Fluent Event builder**: Chain `.attr(key, value)` calls to attach
> structured attributes, with primitive overloads
>   (`long`, `int`, `double`, `float`, `boolean`) to avoid autoboxing.
> * **Derived loggers with context**: `logger.with().attr("topic",
> topic).build()` creates a child logger where
>   context attributes are automatically included in every log call.
> * **Context composition**: `logger.with().ctx(otherLogger).build()`
> inherits context from another logger, enabling
>   cross-component context propagation without coupling.
> * **Zero overhead when disabled**: Level guards are checked before
> constructing any objects. The fluent builders
>   return a no-op singleton `Event` when the level is disabled, making all
> chained calls effectively free.
> * **Timed events**: `log.info().timed()` automatically records elapsed
> duration between the call and `.log()`.
> * **Lazy evaluation**: `log.debug(e -> e.attr(...).log(...))` — the lambda
> only executes when the level is enabled.
> * **Exception support**: `.exception(t)` for full stack traces,
> `.exceptionMessage(t)` for message-only.
> * **Lombok integration**: `@CustomLog` annotation generates the static
> logger field automatically.
>
> slog auto-discovers the backend at runtime: Log4j2 (preferred) or SLF4J
> 1.x/2.x as fallback.
>
> # Motivation
>
> While PIP-89 was approved and the `structured-event-log` module was added
> to the Pulsar codebase, adoption has been
> very limited. Almost all Pulsar components still use traditional SLF4J
> logging with string interpolation. The
> `structured-event-log` module's API is verbose and lacks support for
> standard log levels like `debug` and `trace`,
> which has hindered adoption.
>
> Meanwhile, the slog library has evolved to address these shortcomings with
> a cleaner, more complete API. This PIP
> proposes to adopt slog as the standard logging approach across all Pulsar
> components, replacing both direct SLF4J
> usage and the internal `structured-event-log` module.
>
> ### Problems with the current state
>
> **Inconsistent log formatting**: Every developer formats log messages
> differently. The same information appears in
> different positions, with different delimiters, and different key naming:
> ```java
> // Various inconsistent patterns found across the codebase:
> log.info("[{}] Opening managed ledger {}", name, ledgerId);
> log.info("[{}] [{}] Cursor {} recovered to position {}", ledgerName,
> cursorName, cursor, position);
> log.warn("Error opening ledger {} for cursor {}: {}", ledgerId, cursorName,
> errorMsg);
> ```
>
> **Repeated boilerplate context**: The managed ledger name, cursor name,
> topic name, and other context identifiers
> are repeated as positional arguments in nearly every log statement — often
> as the first one or two `{}` placeholders.
> This is verbose, error-prone, and frequently inconsistent.
>
> **isDebugEnabled guards**: Debug-level logging requires explicit level
> guards to avoid the cost of string
> construction and argument evaluation:
> ```java
> if (log.isDebugEnabled()) {
>     log.debug("[{}] Reading entry ledger {}: {}", name,
> position.getLedgerId(), position.getEntryId());
> }
> ```
> This adds significant boilerplate — in `ManagedLedgerImpl` alone, there
> were 58 such guards.
>
> **Poor queryability**: Log analysis tools (Splunk, Elasticsearch, Loki)
> cannot reliably extract values embedded in
> interpolated strings. Regex-based parsing is fragile and varies by message
> format.
>
> **Missing context in error scenarios**: When debugging production issues,
> engineers frequently find that critical
> context was omitted from log messages because adding another positional
> argument was cumbersome.
>
> ### Why slog over `structured-event-log`
>
> * **Complete level support**: slog supports all five standard levels
> (`trace`, `debug`, `info`, `warn`, `error`),
>   while `structured-event-log` only supports `info`, `warn`, and `error`.
> * **Simpler API**: slog's `log.info().attr("k", v).log("msg")` is more
> natural than
>   `log.newRootEvent().resource("k", v).log("msg")`.
> * **Better performance**: slog uses a cached generation-counter scheme for
> level checks with a single integer
>   comparison and no volatile fence. Primitive `attr()` overloads avoid
> autoboxing.
> * **Derived loggers**: `logger.with().attr(...)` creates an immutable child
> logger with persistent context — cleaner
>   than creating `EventResources` objects.
> * **Lazy evaluation**: Lambda-based logging (`log.debug(e -> ...)`)
> eliminates `isDebugEnabled` guards entirely.
> * **Lombok integration**: `@CustomLog` works with slog, reducing logger
> declaration boilerplate.
> * **Standalone library**: slog is maintained as an independent library,
> decoupled from Pulsar's release cycle.
>   Being a separate library makes it easier to adopt across related projects
> (e.g., BookKeeper, Oxia Java client
>   which already uses slog), enabling consistent structured logging and
> cross-project context propagation.
>
> # Goals
>
> ## In Scope
>
> * **Adopt slog** as the standard structured logging library for all Pulsar
> components.
> * **Remove the internal `structured-event-log` module** from the Pulsar
> codebase, since slog supersedes it.
> * **Migrate all components** from SLF4J to slog, one module at a time.
> Components can be migrated independently
>   and there is no concern with a coexistence period where some components
> use SLF4J and others use slog.
> * **Establish logging conventions**: standardized attribute names
> (`managedLedger`, `cursor`, `ledgerId`, `topic`,
>   `position`, etc.) to ensure consistency across the codebase.
> * **Use derived loggers for context propagation**: Classes that carry
> identity (e.g., a managed ledger's name, a
>   cursor's name, a topic) should create derived loggers that automatically
> attach this context to every log call.
>
> * **Cross-boundary context propagation with BookKeeper**: Since Pulsar
> embeds BookKeeper, slog's context
>   composition (`ctx()`) enables sharing context across the Pulsar →
> BookKeeper boundary. For example, a BookKeeper
>   write failure event log would automatically carry the Pulsar topic name
> as an attribute, without BookKeeper needing
>   to know about Pulsar concepts. This dramatically improves debuggability
> of storage-layer issues.
>
> ## Out of Scope
>
> * Defining centralized event enums (as discussed in PIP-89). slog uses
> string messages, and adopting an enum-based
>   log event model would be highly impractical given the scale of the
> codebase.
> * Modifying the log output format or Log4j2 configuration. slog works
> transparently with existing logging backends
>   and configurations.
> * Introducing distributed tracing or trace ID propagation. This is
> orthogonal and can be layered on top later.
>
> ### Pulsar client libraries
>
> The `pulsar-client` module is a special case because its logs are emitted
> within users' applications rather than
> on the server side. slog auto-discovers the logging backend at runtime
> (Log4j2 or SLF4J), so it will integrate
> transparently with whatever logging framework the user's application
> already has configured. No action is required
> by users — the client will simply emit structured log records through the
> existing backend, and users who configure
> a JSON appender will automatically benefit from the structured attributes.
>
> # High Level Design
>
> The migration follows a straightforward, incremental approach:
>
> 1. **Add slog as a dependency** in the Pulsar build system (Gradle version
> catalog).
> 2. **Configure Lombok** with `@CustomLog` support for slog.
> 3. **Migrate modules one at a time**, starting with `managed-ledger` as the
> first module (already done as a
>    proof of concept). Each module migration involves:
>    - Replacing SLF4J imports and logger declarations with slog.
>    - Converting log statements from parameterized format to structured
> format.
>    - Introducing derived loggers for repeated context (e.g., ledger name,
> cursor name).
>    - Removing `isDebugEnabled()` / `isTraceEnabled()` guards.
> 4. **Delete the `structured-event-log` module** — it has no existing usages
> and can be removed immediately.
>
> Since slog auto-discovers the logging backend (Log4j2 or SLF4J), no changes
> to logging configuration files are
> needed. The structured attributes are emitted via the backend's native
> mechanism (Log4j2 key-value pairs or SLF4J
> MDC), so existing log appenders and formats continue to work. Users who
> want to leverage structured attributes in
> their log output can configure a JSON appender.
>
> # Detailed Design
>
> ## Design & Implementation Details
>
> ### Logger declaration
>
> For simple classes, use Lombok's `@CustomLog` annotation:
>
> ```java
> @CustomLog
> public class OpAddEntry implements AddCallback {
>     // Lombok generates: private static final Logger log =
> Logger.get(OpAddEntry.class);
>
>     public void initiate() {
>         log.info().attr("ledgerId", ledger.getId()).log("Initiating add
> entry");
>     }
> }
> ```
>
> For classes with persistent identity context (e.g., a managed ledger, a
> cursor, a topic handler), create a
> derived instance logger:
>
> ```java
> public class ManagedLedgerImpl implements ManagedLedger {
>     private static final Logger slog = Logger.get(ManagedLedgerImpl.class);
>     protected final Logger log;
>
>     public ManagedLedgerImpl(..., String name, ...) {
>         this.log = slog.with().attr("managedLedger", name).build();
>     }
>
>     // All subsequent log calls automatically include managedLedger=<name>
>     public void openLedger(long ledgerId) {
>         log.info().attr("ledgerId", ledgerId).log("Opening ledger");
>         // Emits: msg="Opening ledger" managedLedger=my-topic
> ledgerId=12345
>     }
> }
> ```
>
> ### Context composition across components
>
> Classes that operate on behalf of another component can inherit its
> context:
>
> ```java
> public class RangeEntryCacheImpl implements EntryCache {
>     private static final Logger slog =
> Logger.get(RangeEntryCacheImpl.class);
>     private final Logger log;
>
>     RangeEntryCacheImpl(ManagedLedgerImpl ml, ...) {
>         this.log = slog.with().ctx(ml.getLogger()).build();
>         // Logger name is RangeEntryCacheImpl, but carries managedLedger
> context from ml
>     }
> }
> ```
>
> ### Cross-boundary context: Pulsar → BookKeeper
>
> One of the most powerful benefits of slog's context model is the ability to
> propagate context across component
> boundaries. Since Pulsar embeds BookKeeper, a BookKeeper ledger operation
> can inherit context from the Pulsar
> component that initiated it:
>
> ```java
> // In Pulsar's ManagedLedgerImpl (already has managedLedger context)
> public void createLedger() {
>     // Pass the Pulsar logger context to BookKeeper
>     bookKeeper.asyncCreateLedger(..., log, ...);
> }
>
> // In BookKeeper's LedgerHandle
> public class LedgerHandle {
>     private final Logger log;
>
>     LedgerHandle(Logger pulsarLogger, ...) {
>         // BookKeeper's logger inherits Pulsar's context
>         this.log = slog.with().ctx(pulsarLogger).attr("ledgerId",
> ledgerId).build();
>     }
>
>     void handleWriteError(int rc) {
>         // This log entry automatically includes managedLedger=my-topic
> from Pulsar's context
>         log.error().attr("rc", BKException.getMessage(rc)).log("Write
> failed");
>         // Emits: msg="Write failed" managedLedger=my-topic ledgerId=12345
> rc=NotEnoughBookies
>     }
> }
> ```
>
> This eliminates the long-standing problem of BookKeeper errors being logged
> without any Pulsar-level context,
> making it very difficult to correlate storage failures with the affected
> topics.
>
> ### Conversion patterns
>
> | SLF4J pattern | slog equivalent |
> |---|---|
> | `log.info("msg")` | `log.info("msg")` |
> | `log.info("[{}] msg {}", name, val)` | `log.info().attr("key",
> val).log("msg")` |
> | `log.error("msg", exception)` | `log.error().exception(e).log("msg")` |
> | `log.warn("msg: {}", e.getMessage())` |
> `log.warn().exceptionMessage(e).log("msg")` |
> | `if (log.isDebugEnabled()) { log.debug(...) }` |
> `log.debug().attr(...).log(...)` |
> | Manual timing with `System.nanoTime()` | `log.info().timed() ...
> event.log("done")` |
>
> ### Attribute naming conventions
>
> To ensure consistency across the codebase, the following attribute names
> should be used:
>
> | Attribute | Description | Example value |
> |---|---|---|
> | `managedLedger` | Managed ledger name | `my-tenant/my-ns/my-topic` |
> | `cursor` | Cursor name | `my-subscription` |
> | `topic` | Topic name | `persistent://tenant/ns/topic` |
> | `ledgerId` | BookKeeper ledger ID | `12345` |
> | `entryId` | Entry ID within a ledger | `42` |
> | `position` | Ledger position (ledgerId:entryId) | `12345:42` |
> | `readPosition` | Current read position | `12345:42` |
> | `markDeletePosition` | Mark-delete position | `12345:41` |
> | `state` | Component state | `Open`, `Closed` |
> | `rc` | BookKeeper return code | `OK`, `NoSuchLedger` |
> | `subscription` | Subscription name | `my-sub` |
> | `consumer` | Consumer name | `consumer-0` |
> | `producer` | Producer name | `producer-0` |
>
> ## Public-facing Changes
>
> ### Configuration
>
> A new Lombok configuration is added to `lombok.config`:
>
> ```properties
> lombok.log.custom.declaration = io.github.merlimat.slog.Logger
> io.github.merlimat.slog.Logger.get(TYPE)
> ```
>
> ### Log output
>
> The log message format changes. With traditional SLF4J:
> ```
> [ManagedLedgerImpl] [my-topic] Opening ledger 12345 with 100 entries
> ```
>
> With slog (pattern appender):
> ```
> [ManagedLedgerImpl] Opening ledger {managedLedger=my-topic, ledgerId=12345,
> entries=100}
> ```
>
> With slog (Log4j2 JSON appender):
> ```json
> {
>   "instant": {"epochSecond": 1775174210, "nanoOfSecond": 277000000},
>   "thread": "pulsar-ordered-executor-0-0",
>   "level": "INFO",
>   "loggerName": "org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl",
>   "message": "Opening ledger",
>   "contextMap": {
>     "managedLedger": "my-topic",
>     "ledgerId": "12345",
>     "entries": "100"
>   },
>   "endOfBatch": false,
>   "loggerFqcn": "io.github.merlimat.slog.impl.EventImpl",
>   "threadId": 42,
>   "threadPriority": 5
> }
> ```
>
> Users who rely on parsing specific log message patterns will need to update
> their parsers. However, the structured
> format makes parsing significantly easier and more reliable.
>
> # Monitoring
>
> No changes to monitoring. Structured logging improves the ability to create
> log-based alerts and dashboards by
> making attribute-based filtering possible (e.g., alert on all `ERROR` logs
> where `managedLedger=X`).
>
> # Security Considerations
>
> No security implications. This is an internal change to how log messages
> are formatted. No new endpoints, commands,
> or permissions are introduced.
>
> # Backward & Forward Compatibility
>
> ## Upgrade
>
> * Log message formats will change as modules are migrated. Users who parse
> log messages with regex patterns will
>   need to update their parsers. The structured format is strictly more
> parseable than the previous format.
> * To get the most out of structured logging, users should consider
> switching to a JSON log appender if they haven't
>   already. However, this is optional — traditional pattern-based appenders
> continue to work.
> * No configuration changes are required for the upgrade.
>
> ## Downgrade / Rollback
>
> No special steps required. Reverting to a previous version restores the
> original SLF4J logging.
>
> ## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations
>
> No impact on geo-replication. Logging is a local concern and does not
> affect inter-cluster communication.
>
> # Alternatives
>
> ### Continue using `structured-event-log`
>
> The existing internal module could be extended. However, it lacks
> debug/trace levels, has a more verbose API, and
> maintaining a custom logging library inside Pulsar adds unnecessary burden.
> slog is a superset of the
> `structured-event-log` API with better performance and more features.
>
> ### Use SLF4J 2.0 Fluent API
>
> SLF4J 2.0 introduced a fluent logging API (`log.atInfo().addKeyValue("k",
> v).log("msg")`). However:
> * It does not support derived loggers with persistent context.
> * It does not support context composition across components.
> * No support for timed events or sampling.
> * Attribute values are stored as strings only (no primitive overloads).
> * Level guards still allocate objects in some implementations.
>
> ### Use Log4j2 directly
>
> Log4j2 does not expose a public structured logging API for applications.
> While it is possible to pass structured
> key-value data through low-level, non-public Log4j2 interfaces (which is
> what slog does internally as its
> backend), this is not something easily or reliably usable directly from
> application code. It would also tightly
> couple Pulsar to Log4j2 as the only backend, preventing users from using
> other SLF4J-compatible backends.
> slog abstracts over these internals and supports both Log4j2 and SLF4J
> backends transparently.
>
> ### Use Flogger
>
> [Flogger](https://github.com/google/flogger) is Google's fluent logging
> API
> for Java. While it offers a fluent API
> and some structured logging features, it has significant performance
> drawbacks:
> * Extremely slow on the enabled path (~0.7 ops/μs vs slog's 13–25 ops/μs —
> roughly **18–35× slower**).
> * Very high allocation rate (1624–1904 B/op vs slog's 0–80 B/op — **20–24×
> more allocations**).
> * No support for derived loggers with persistent context or context
> composition.
> * No support for timed events.
> * Tightly coupled to Google's internal infrastructure; limited backend
> flexibility.
>
> ### Performance comparison
>
> Benchmarks from the [slog repository](https://github.com/merlimat/slog)
> (JMH, higher ops/μs is better, lower B/op
> is better):
>
> **Disabled path** (log level is off — measures the cost of checking the
> level):
>
> | Library              | ops/μs  |
> |----------------------|---------|
> | slog                 | 1005.4  |
> | Log4j2               | 413.1   |
> | SLF4J                | 367.5   |
> | Flogger              | 360.3   |
>
> slog's disabled path is **2.4× faster** than Log4j2 and **2.8× faster**
> than SLF4J/Flogger. This matters because
> the vast majority of debug/trace log statements are disabled in production.
>
> **Enabled path** (log level is on — measures the cost of formatting and
> emitting a log message with 2 key-value
> attributes):
>
> | Library              | ops/μs |
> |----------------------|--------|
> | slog (simple)        | 24.6   |
> | Log4j2 (simple)      | 14.6   |
> | slog (fluent)        | 13.2   |
> | slog (fluent+ctx)    | 12.0   |
> | SLF4J (simple)       | 11.2   |
> | SLF4J (fluent)       | 4.2    |
> | Flogger              | 0.7    |
>
> slog is the fastest in both simple and fluent modes. Flogger is **18–35×
> slower** than slog.
>
> **Allocation rate** (bytes allocated per log operation):
>
> | Library              | B/op |
> |----------------------|------|
> | slog (simple)        | 0    |
> | Log4j2 (simple)      | 24   |
> | SLF4J (simple)       | 24   |
> | slog (fluent+ctx)    | 40   |
> | slog (fluent)        | 80   |
> | SLF4J (fluent)       | 1104 |
> | Flogger              | 1624 |
>
> slog's fluent API allocates **6× less** than Log4j2's fluent API, **14×
> less** than SLF4J's fluent API, and
> **20× less** than Flogger. Since structured logging requires the fluent
> API, this is the most relevant comparison
> for this PIP's use case.
>
> # General Notes
>
> The `managed-ledger` module has already been fully migrated as a proof of
> concept, converting all source and test files
> (54 files, ~400 log statements). The migration reduced total lines of code
> (1481 additions vs 1392 deletions) while
> making every log statement more informative and consistent.
>
> # Links
>
> * Original PIP-89:
> https://github.com/apache/pulsar/blob/master/pip/pip-89.md
> * slog library: https://github.com/merlimat/slog
> * PoC: managed-ledger migration:
> https://github.com/merlimat/pulsar/pull/20
>
>
> --
> Matteo Merli
> <[email protected]>
>

Re: [DISCUSS] PIP-467: Adopt slog for structured logging across Pulsar

Reply via email to