Re: [DISCUSS] PIP-467: Adopt slog for structured logging across Pulsar

mattison chao Sun, 05 Apr 2026 10:04:15 -0700

+1

On Sun, 5 Apr 2026 at 13:40, Tao Jiuming <[email protected]> wrote:


> +1
>
> Matteo Merli <[email protected]>于2026年4月4日 周六01:38写道：
>
> > https://github.com/apache/pulsar/pull/25462
> >
> >
> > # Background knowledge
> >
> > [PIP-89](https://github.com/apache/pulsar/blob/master/pip/pip-89.md)
> > introduced the concept of structured event
> > logging for Pulsar and was approved in 2021. It proposed replacing SLF4J
> > string-interpolated logging with structured
> > key-value logging, using a custom wrapper (`structured-event-log` module)
> > built on top of SLF4J's MDC.
> >
> > Since then, a standalone library called [slog](
> > https://github.com/merlimat/slog) has been developed that provides a
> > more complete and performant implementation of structured logging. slog
> > builds on the same principles as PIP-89 but
> > with a significantly improved API, better performance characteristics,
> and
> > support for all standard log levels.
> >
> > ### Structured logging
> >
> > Traditional logging uses string interpolation to embed values into log
> > messages:
> >
> > ```java
> > log.info("[{}] Opening ledger {} with {} entries", name, ledgerId,
> > entries);
> > ```
> >
> > Structured logging instead attaches context as typed key-value
> attributes:
> >
> > ```java
> > log.info().attr("ledgerId", ledgerId).attr("entries",
> > entries).log("Opening
> > ledger");
> > ```
> >
> > This makes log records machine-parseable, filterable, and queryable
> without
> > complex regex patterns.
> >
> > ### slog API
> >
> > slog provides a `Logger` interface with the following key features:
> >
> > * **All standard levels**: `trace()`, `debug()`, `info()`, `warn()`,
> > `error()` — each with four variants:
> >   simple message, printf-formatted, fluent builder, and
> > lambda/consumer-based.
> > * **Fluent Event builder**: Chain `.attr(key, value)` calls to attach
> > structured attributes, with primitive overloads
> >   (`long`, `int`, `double`, `float`, `boolean`) to avoid autoboxing.
> > * **Derived loggers with context**: `logger.with().attr("topic",
> > topic).build()` creates a child logger where
> >   context attributes are automatically included in every log call.
> > * **Context composition**: `logger.with().ctx(otherLogger).build()`
> > inherits context from another logger, enabling
> >   cross-component context propagation without coupling.
> > * **Zero overhead when disabled**: Level guards are checked before
> > constructing any objects. The fluent builders
> >   return a no-op singleton `Event` when the level is disabled, making all
> > chained calls effectively free.
> > * **Timed events**: `log.info().timed()` automatically records elapsed
> > duration between the call and `.log()`.
> > * **Lazy evaluation**: `log.debug(e -> e.attr(...).log(...))` — the
> lambda
> > only executes when the level is enabled.
> > * **Exception support**: `.exception(t)` for full stack traces,
> > `.exceptionMessage(t)` for message-only.
> > * **Lombok integration**: `@CustomLog` annotation generates the static
> > logger field automatically.
> >
> > slog auto-discovers the backend at runtime: Log4j2 (preferred) or SLF4J
> > 1.x/2.x as fallback.
> >
> > # Motivation
> >
> > While PIP-89 was approved and the `structured-event-log` module was added
> > to the Pulsar codebase, adoption has been
> > very limited. Almost all Pulsar components still use traditional SLF4J
> > logging with string interpolation. The
> > `structured-event-log` module's API is verbose and lacks support for
> > standard log levels like `debug` and `trace`,
> > which has hindered adoption.
> >
> > Meanwhile, the slog library has evolved to address these shortcomings
> with
> > a cleaner, more complete API. This PIP
> > proposes to adopt slog as the standard logging approach across all Pulsar
> > components, replacing both direct SLF4J
> > usage and the internal `structured-event-log` module.
> >
> > ### Problems with the current state
> >
> > **Inconsistent log formatting**: Every developer formats log messages
> > differently. The same information appears in
> > different positions, with different delimiters, and different key naming:
> > ```java
> > // Various inconsistent patterns found across the codebase:
> > log.info("[{}] Opening managed ledger {}", name, ledgerId);
> > log.info("[{}] [{}] Cursor {} recovered to position {}", ledgerName,
> > cursorName, cursor, position);
> > log.warn("Error opening ledger {} for cursor {}: {}", ledgerId,
> cursorName,
> > errorMsg);
> > ```
> >
> > **Repeated boilerplate context**: The managed ledger name, cursor name,
> > topic name, and other context identifiers
> > are repeated as positional arguments in nearly every log statement —
> often
> > as the first one or two `{}` placeholders.
> > This is verbose, error-prone, and frequently inconsistent.
> >
> > **isDebugEnabled guards**: Debug-level logging requires explicit level
> > guards to avoid the cost of string
> > construction and argument evaluation:
> > ```java
> > if (log.isDebugEnabled()) {
> >     log.debug("[{}] Reading entry ledger {}: {}", name,
> > position.getLedgerId(), position.getEntryId());
> > }
> > ```
> > This adds significant boilerplate — in `ManagedLedgerImpl` alone, there
> > were 58 such guards.
> >
> > **Poor queryability**: Log analysis tools (Splunk, Elasticsearch, Loki)
> > cannot reliably extract values embedded in
> > interpolated strings. Regex-based parsing is fragile and varies by
> message
> > format.
> >
> > **Missing context in error scenarios**: When debugging production issues,
> > engineers frequently find that critical
> > context was omitted from log messages because adding another positional
> > argument was cumbersome.
> >
> > ### Why slog over `structured-event-log`
> >
> > * **Complete level support**: slog supports all five standard levels
> > (`trace`, `debug`, `info`, `warn`, `error`),
> >   while `structured-event-log` only supports `info`, `warn`, and `error`.
> > * **Simpler API**: slog's `log.info().attr("k", v).log("msg")` is more
> > natural than
> >   `log.newRootEvent().resource("k", v).log("msg")`.
> > * **Better performance**: slog uses a cached generation-counter scheme
> for
> > level checks with a single integer
> >   comparison and no volatile fence. Primitive `attr()` overloads avoid
> > autoboxing.
> > * **Derived loggers**: `logger.with().attr(...)` creates an immutable
> child
> > logger with persistent context — cleaner
> >   than creating `EventResources` objects.
> > * **Lazy evaluation**: Lambda-based logging (`log.debug(e -> ...)`)
> > eliminates `isDebugEnabled` guards entirely.
> > * **Lombok integration**: `@CustomLog` works with slog, reducing logger
> > declaration boilerplate.
> > * **Standalone library**: slog is maintained as an independent library,
> > decoupled from Pulsar's release cycle.
> >   Being a separate library makes it easier to adopt across related
> projects
> > (e.g., BookKeeper, Oxia Java client
> >   which already uses slog), enabling consistent structured logging and
> > cross-project context propagation.
> >
> > # Goals
> >
> > ## In Scope
> >
> > * **Adopt slog** as the standard structured logging library for all
> Pulsar
> > components.
> > * **Remove the internal `structured-event-log` module** from the Pulsar
> > codebase, since slog supersedes it.
> > * **Migrate all components** from SLF4J to slog, one module at a time.
> > Components can be migrated independently
> >   and there is no concern with a coexistence period where some components
> > use SLF4J and others use slog.
> > * **Establish logging conventions**: standardized attribute names
> > (`managedLedger`, `cursor`, `ledgerId`, `topic`,
> >   `position`, etc.) to ensure consistency across the codebase.
> > * **Use derived loggers for context propagation**: Classes that carry
> > identity (e.g., a managed ledger's name, a
> >   cursor's name, a topic) should create derived loggers that
> automatically
> > attach this context to every log call.
> >
> > * **Cross-boundary context propagation with BookKeeper**: Since Pulsar
> > embeds BookKeeper, slog's context
> >   composition (`ctx()`) enables sharing context across the Pulsar →
> > BookKeeper boundary. For example, a BookKeeper
> >   write failure event log would automatically carry the Pulsar topic name
> > as an attribute, without BookKeeper needing
> >   to know about Pulsar concepts. This dramatically improves debuggability
> > of storage-layer issues.
> >
> > ## Out of Scope
> >
> > * Defining centralized event enums (as discussed in PIP-89). slog uses
> > string messages, and adopting an enum-based
> >   log event model would be highly impractical given the scale of the
> > codebase.
> > * Modifying the log output format or Log4j2 configuration. slog works
> > transparently with existing logging backends
> >   and configurations.
> > * Introducing distributed tracing or trace ID propagation. This is
> > orthogonal and can be layered on top later.
> >
> > ### Pulsar client libraries
> >
> > The `pulsar-client` module is a special case because its logs are emitted
> > within users' applications rather than
> > on the server side. slog auto-discovers the logging backend at runtime
> > (Log4j2 or SLF4J), so it will integrate
> > transparently with whatever logging framework the user's application
> > already has configured. No action is required
> > by users — the client will simply emit structured log records through the
> > existing backend, and users who configure
> > a JSON appender will automatically benefit from the structured
> attributes.
> >
> > # High Level Design
> >
> > The migration follows a straightforward, incremental approach:
> >
> > 1. **Add slog as a dependency** in the Pulsar build system (Gradle
> version
> > catalog).
> > 2. **Configure Lombok** with `@CustomLog` support for slog.
> > 3. **Migrate modules one at a time**, starting with `managed-ledger` as
> the
> > first module (already done as a
> >    proof of concept). Each module migration involves:
> >    - Replacing SLF4J imports and logger declarations with slog.
> >    - Converting log statements from parameterized format to structured
> > format.
> >    - Introducing derived loggers for repeated context (e.g., ledger name,
> > cursor name).
> >    - Removing `isDebugEnabled()` / `isTraceEnabled()` guards.
> > 4. **Delete the `structured-event-log` module** — it has no existing
> usages
> > and can be removed immediately.
> >
> > Since slog auto-discovers the logging backend (Log4j2 or SLF4J), no
> changes
> > to logging configuration files are
> > needed. The structured attributes are emitted via the backend's native
> > mechanism (Log4j2 key-value pairs or SLF4J
> > MDC), so existing log appenders and formats continue to work. Users who
> > want to leverage structured attributes in
> > their log output can configure a JSON appender.
> >
> > # Detailed Design
> >
> > ## Design & Implementation Details
> >
> > ### Logger declaration
> >
> > For simple classes, use Lombok's `@CustomLog` annotation:
> >
> > ```java
> > @CustomLog
> > public class OpAddEntry implements AddCallback {
> >     // Lombok generates: private static final Logger log =
> > Logger.get(OpAddEntry.class);
> >
> >     public void initiate() {
> >         log.info().attr("ledgerId", ledger.getId()).log("Initiating add
> > entry");
> >     }
> > }
> > ```
> >
> > For classes with persistent identity context (e.g., a managed ledger, a
> > cursor, a topic handler), create a
> > derived instance logger:
> >
> > ```java
> > public class ManagedLedgerImpl implements ManagedLedger {
> >     private static final Logger slog =
> Logger.get(ManagedLedgerImpl.class);
> >     protected final Logger log;
> >
> >     public ManagedLedgerImpl(..., String name, ...) {
> >         this.log = slog.with().attr("managedLedger", name).build();
> >     }
> >
> >     // All subsequent log calls automatically include
> managedLedger=<name>
> >     public void openLedger(long ledgerId) {
> >         log.info().attr("ledgerId", ledgerId).log("Opening ledger");
> >         // Emits: msg="Opening ledger" managedLedger=my-topic
> > ledgerId=12345
> >     }
> > }
> > ```
> >
> > ### Context composition across components
> >
> > Classes that operate on behalf of another component can inherit its
> > context:
> >
> > ```java
> > public class RangeEntryCacheImpl implements EntryCache {
> >     private static final Logger slog =
> > Logger.get(RangeEntryCacheImpl.class);
> >     private final Logger log;
> >
> >     RangeEntryCacheImpl(ManagedLedgerImpl ml, ...) {
> >         this.log = slog.with().ctx(ml.getLogger()).build();
> >         // Logger name is RangeEntryCacheImpl, but carries managedLedger
> > context from ml
> >     }
> > }
> > ```
> >
> > ### Cross-boundary context: Pulsar → BookKeeper
> >
> > One of the most powerful benefits of slog's context model is the ability
> to
> > propagate context across component
> > boundaries. Since Pulsar embeds BookKeeper, a BookKeeper ledger operation
> > can inherit context from the Pulsar
> > component that initiated it:
> >
> > ```java
> > // In Pulsar's ManagedLedgerImpl (already has managedLedger context)
> > public void createLedger() {
> >     // Pass the Pulsar logger context to BookKeeper
> >     bookKeeper.asyncCreateLedger(..., log, ...);
> > }
> >
> > // In BookKeeper's LedgerHandle
> > public class LedgerHandle {
> >     private final Logger log;
> >
> >     LedgerHandle(Logger pulsarLogger, ...) {
> >         // BookKeeper's logger inherits Pulsar's context
> >         this.log = slog.with().ctx(pulsarLogger).attr("ledgerId",
> > ledgerId).build();
> >     }
> >
> >     void handleWriteError(int rc) {
> >         // This log entry automatically includes managedLedger=my-topic
> > from Pulsar's context
> >         log.error().attr("rc", BKException.getMessage(rc)).log("Write
> > failed");
> >         // Emits: msg="Write failed" managedLedger=my-topic
> ledgerId=12345
> > rc=NotEnoughBookies
> >     }
> > }
> > ```
> >
> > This eliminates the long-standing problem of BookKeeper errors being
> logged
> > without any Pulsar-level context,
> > making it very difficult to correlate storage failures with the affected
> > topics.
> >
> > ### Conversion patterns
> >
> > | SLF4J pattern | slog equivalent |
> > |---|---|
> > | `log.info("msg")` | `log.info("msg")` |
> > | `log.info("[{}] msg {}", name, val)` | `log.info().attr("key",
> > val).log("msg")` |
> > | `log.error("msg", exception)` | `log.error().exception(e).log("msg")` |
> > | `log.warn("msg: {}", e.getMessage())` |
> > `log.warn().exceptionMessage(e).log("msg")` |
> > | `if (log.isDebugEnabled()) { log.debug(...) }` |
> > `log.debug().attr(...).log(...)` |
> > | Manual timing with `System.nanoTime()` | `log.info().timed() ...
> > event.log("done")` |
> >
> > ### Attribute naming conventions
> >
> > To ensure consistency across the codebase, the following attribute names
> > should be used:
> >
> > | Attribute | Description | Example value |
> > |---|---|---|
> > | `managedLedger` | Managed ledger name | `my-tenant/my-ns/my-topic` |
> > | `cursor` | Cursor name | `my-subscription` |
> > | `topic` | Topic name | `persistent://tenant/ns/topic` |
> > | `ledgerId` | BookKeeper ledger ID | `12345` |
> > | `entryId` | Entry ID within a ledger | `42` |
> > | `position` | Ledger position (ledgerId:entryId) | `12345:42` |
> > | `readPosition` | Current read position | `12345:42` |
> > | `markDeletePosition` | Mark-delete position | `12345:41` |
> > | `state` | Component state | `Open`, `Closed` |
> > | `rc` | BookKeeper return code | `OK`, `NoSuchLedger` |
> > | `subscription` | Subscription name | `my-sub` |
> > | `consumer` | Consumer name | `consumer-0` |
> > | `producer` | Producer name | `producer-0` |
> >
> > ## Public-facing Changes
> >
> > ### Configuration
> >
> > A new Lombok configuration is added to `lombok.config`:
> >
> > ```properties
> > lombok.log.custom.declaration = io.github.merlimat.slog.Logger
> > io.github.merlimat.slog.Logger.get(TYPE)
> > ```
> >
> > ### Log output
> >
> > The log message format changes. With traditional SLF4J:
> > ```
> > [ManagedLedgerImpl] [my-topic] Opening ledger 12345 with 100 entries
> > ```
> >
> > With slog (pattern appender):
> > ```
> > [ManagedLedgerImpl] Opening ledger {managedLedger=my-topic,
> ledgerId=12345,
> > entries=100}
> > ```
> >
> > With slog (Log4j2 JSON appender):
> > ```json
> > {
> >   "instant": {"epochSecond": 1775174210, "nanoOfSecond": 277000000},
> >   "thread": "pulsar-ordered-executor-0-0",
> >   "level": "INFO",
> >   "loggerName": "org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl",
> >   "message": "Opening ledger",
> >   "contextMap": {
> >     "managedLedger": "my-topic",
> >     "ledgerId": "12345",
> >     "entries": "100"
> >   },
> >   "endOfBatch": false,
> >   "loggerFqcn": "io.github.merlimat.slog.impl.EventImpl",
> >   "threadId": 42,
> >   "threadPriority": 5
> > }
> > ```
> >
> > Users who rely on parsing specific log message patterns will need to
> update
> > their parsers. However, the structured
> > format makes parsing significantly easier and more reliable.
> >
> > # Monitoring
> >
> > No changes to monitoring. Structured logging improves the ability to
> create
> > log-based alerts and dashboards by
> > making attribute-based filtering possible (e.g., alert on all `ERROR`
> logs
> > where `managedLedger=X`).
> >
> > # Security Considerations
> >
> > No security implications. This is an internal change to how log messages
> > are formatted. No new endpoints, commands,
> > or permissions are introduced.
> >
> > # Backward & Forward Compatibility
> >
> > ## Upgrade
> >
> > * Log message formats will change as modules are migrated. Users who
> parse
> > log messages with regex patterns will
> >   need to update their parsers. The structured format is strictly more
> > parseable than the previous format.
> > * To get the most out of structured logging, users should consider
> > switching to a JSON log appender if they haven't
> >   already. However, this is optional — traditional pattern-based
> appenders
> > continue to work.
> > * No configuration changes are required for the upgrade.
> >
> > ## Downgrade / Rollback
> >
> > No special steps required. Reverting to a previous version restores the
> > original SLF4J logging.
> >
> > ## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations
> >
> > No impact on geo-replication. Logging is a local concern and does not
> > affect inter-cluster communication.
> >
> > # Alternatives
> >
> > ### Continue using `structured-event-log`
> >
> > The existing internal module could be extended. However, it lacks
> > debug/trace levels, has a more verbose API, and
> > maintaining a custom logging library inside Pulsar adds unnecessary
> burden.
> > slog is a superset of the
> > `structured-event-log` API with better performance and more features.
> >
> > ### Use SLF4J 2.0 Fluent API
> >
> > SLF4J 2.0 introduced a fluent logging API (`log.atInfo().addKeyValue("k",
> > v).log("msg")`). However:
> > * It does not support derived loggers with persistent context.
> > * It does not support context composition across components.
> > * No support for timed events or sampling.
> > * Attribute values are stored as strings only (no primitive overloads).
> > * Level guards still allocate objects in some implementations.
> >
> > ### Use Log4j2 directly
> >
> > Log4j2 does not expose a public structured logging API for applications.
> > While it is possible to pass structured
> > key-value data through low-level, non-public Log4j2 interfaces (which is
> > what slog does internally as its
> > backend), this is not something easily or reliably usable directly from
> > application code. It would also tightly
> > couple Pulsar to Log4j2 as the only backend, preventing users from using
> > other SLF4J-compatible backends.
> > slog abstracts over these internals and supports both Log4j2 and SLF4J
> > backends transparently.
> >
> > ### Use Flogger
> >
> > [Flogger](https://github.com/google/flogger) is Google's fluent logging
> > API
> > for Java. While it offers a fluent API
> > and some structured logging features, it has significant performance
> > drawbacks:
> > * Extremely slow on the enabled path (~0.7 ops/μs vs slog's 13–25 ops/μs
> —
> > roughly **18–35× slower**).
> > * Very high allocation rate (1624–1904 B/op vs slog's 0–80 B/op —
> **20–24×
> > more allocations**).
> > * No support for derived loggers with persistent context or context
> > composition.
> > * No support for timed events.
> > * Tightly coupled to Google's internal infrastructure; limited backend
> > flexibility.
> >
> > ### Performance comparison
> >
> > Benchmarks from the [slog repository](https://github.com/merlimat/slog)
> > (JMH, higher ops/μs is better, lower B/op
> > is better):
> >
> > **Disabled path** (log level is off — measures the cost of checking the
> > level):
> >
> > | Library              | ops/μs  |
> > |----------------------|---------|
> > | slog                 | 1005.4  |
> > | Log4j2               | 413.1   |
> > | SLF4J                | 367.5   |
> > | Flogger              | 360.3   |
> >
> > slog's disabled path is **2.4× faster** than Log4j2 and **2.8× faster**
> > than SLF4J/Flogger. This matters because
> > the vast majority of debug/trace log statements are disabled in
> production.
> >
> > **Enabled path** (log level is on — measures the cost of formatting and
> > emitting a log message with 2 key-value
> > attributes):
> >
> > | Library              | ops/μs |
> > |----------------------|--------|
> > | slog (simple)        | 24.6   |
> > | Log4j2 (simple)      | 14.6   |
> > | slog (fluent)        | 13.2   |
> > | slog (fluent+ctx)    | 12.0   |
> > | SLF4J (simple)       | 11.2   |
> > | SLF4J (fluent)       | 4.2    |
> > | Flogger              | 0.7    |
> >
> > slog is the fastest in both simple and fluent modes. Flogger is **18–35×
> > slower** than slog.
> >
> > **Allocation rate** (bytes allocated per log operation):
> >
> > | Library              | B/op |
> > |----------------------|------|
> > | slog (simple)        | 0    |
> > | Log4j2 (simple)      | 24   |
> > | SLF4J (simple)       | 24   |
> > | slog (fluent+ctx)    | 40   |
> > | slog (fluent)        | 80   |
> > | SLF4J (fluent)       | 1104 |
> > | Flogger              | 1624 |
> >
> > slog's fluent API allocates **6× less** than Log4j2's fluent API, **14×
> > less** than SLF4J's fluent API, and
> > **20× less** than Flogger. Since structured logging requires the fluent
> > API, this is the most relevant comparison
> > for this PIP's use case.
> >
> > # General Notes
> >
> > The `managed-ledger` module has already been fully migrated as a proof of
> > concept, converting all source and test files
> > (54 files, ~400 log statements). The migration reduced total lines of
> code
> > (1481 additions vs 1392 deletions) while
> > making every log statement more informative and consistent.
> >
> > # Links
> >
> > * Original PIP-89:
> > https://github.com/apache/pulsar/blob/master/pip/pip-89.md
> > * slog library: https://github.com/merlimat/slog
> > * PoC: managed-ledger migration:
> > https://github.com/merlimat/pulsar/pull/20
> >
> >
> > --
> > Matteo Merli
> > <[email protected]>
> >
>

Re: [DISCUSS] PIP-467: Adopt slog for structured logging across Pulsar

Reply via email to