https://github.com/apache/pulsar/pull/25462

# Background knowledge

[PIP-89](https://github.com/apache/pulsar/blob/master/pip/pip-89.md)
introduced the concept of structured event
logging for Pulsar and was approved in 2021. It proposed replacing
SLF4J string-interpolated logging with structured
key-value logging, using a custom wrapper (`structured-event-log`
module) built on top of SLF4J's MDC.

Since then, a standalone library called
[slog](https://github.com/merlimat/slog) has been developed that
provides a
more complete and performant implementation of structured logging.
slog builds on the same principles as PIP-89 but
with a significantly improved API, better performance characteristics,
and support for all standard log levels.

### Structured logging

Traditional logging uses string interpolation to embed values into log messages:

```java
log.info("[{}] Opening ledger {} with {} entries", name, ledgerId, entries);
```

Structured logging instead attaches context as typed key-value attributes:

```java
log.info().attr("ledgerId", ledgerId).attr("entries",
entries).log("Opening ledger");
```

This makes log records machine-parseable, filterable, and queryable
without complex regex patterns.

### slog API

slog provides a `Logger` interface with the following key features:

* **All standard levels**: `trace()`, `debug()`, `info()`, `warn()`,
`error()` — each with four variants:
  simple message, printf-formatted, fluent builder, and lambda/consumer-based.
* **Fluent Event builder**: Chain `.attr(key, value)` calls to attach
structured attributes, with primitive overloads
  (`long`, `int`, `double`, `float`, `boolean`) to avoid autoboxing.
* **Derived loggers with context**: `logger.with().attr("topic",
topic).build()` creates a child logger where
  context attributes are automatically included in every log call.
* **Context composition**: `logger.with().ctx(otherLogger).build()`
inherits context from another logger, enabling
  cross-component context propagation without coupling.
* **Zero overhead when disabled**: Level guards are checked before
constructing any objects. The fluent builders
  return a no-op singleton `Event` when the level is disabled, making
all chained calls effectively free.
* **Timed events**: `log.info().timed()` automatically records elapsed
duration between the call and `.log()`.
* **Lazy evaluation**: `log.debug(e -> e.attr(...).log(...))` — the
lambda only executes when the level is enabled.
* **Exception support**: `.exception(t)` for full stack traces,
`.exceptionMessage(t)` for message-only.
* **Lombok integration**: `@CustomLog` annotation generates the static
logger field automatically.

slog auto-discovers the backend at runtime: Log4j2 (preferred) or
SLF4J 1.x/2.x as fallback.

# Motivation

While PIP-89 was approved and the `structured-event-log` module was
added to the Pulsar codebase, adoption has been
very limited. Almost all Pulsar components still use traditional SLF4J
logging with string interpolation. The
`structured-event-log` module's API is verbose and lacks support for
standard log levels like `debug` and `trace`,
which has hindered adoption.

Meanwhile, the slog library has evolved to address these shortcomings
with a cleaner, more complete API. This PIP
proposes to adopt slog as the standard logging approach across all
Pulsar components, replacing both direct SLF4J
usage and the internal `structured-event-log` module.

### Problems with the current state

**Inconsistent log formatting**: Every developer formats log messages
differently. The same information appears in
different positions, with different delimiters, and different key naming:
```java
// Various inconsistent patterns found across the codebase:
log.info("[{}] Opening managed ledger {}", name, ledgerId);
log.info("[{}] [{}] Cursor {} recovered to position {}", ledgerName,
cursorName, cursor, position);
log.warn("Error opening ledger {} for cursor {}: {}", ledgerId,
cursorName, errorMsg);
```

**Repeated boilerplate context**: The managed ledger name, cursor
name, topic name, and other context identifiers
are repeated as positional arguments in nearly every log statement —
often as the first one or two `{}` placeholders.
This is verbose, error-prone, and frequently inconsistent.

**isDebugEnabled guards**: Debug-level logging requires explicit level
guards to avoid the cost of string
construction and argument evaluation:
```java
if (log.isDebugEnabled()) {
    log.debug("[{}] Reading entry ledger {}: {}", name,
position.getLedgerId(), position.getEntryId());
}
```
This adds significant boilerplate — in `ManagedLedgerImpl` alone,
there were 58 such guards.

**Poor queryability**: Log analysis tools (Splunk, Elasticsearch,
Loki) cannot reliably extract values embedded in
interpolated strings. Regex-based parsing is fragile and varies by
message format.

**Missing context in error scenarios**: When debugging production
issues, engineers frequently find that critical
context was omitted from log messages because adding another
positional argument was cumbersome.

### Why slog over `structured-event-log`

* **Complete level support**: slog supports all five standard levels
(`trace`, `debug`, `info`, `warn`, `error`),
  while `structured-event-log` only supports `info`, `warn`, and `error`.
* **Simpler API**: slog's `log.info().attr("k", v).log("msg")` is more
natural than
  `log.newRootEvent().resource("k", v).log("msg")`.
* **Better performance**: slog uses a cached generation-counter scheme
for level checks with a single integer
  comparison and no volatile fence. Primitive `attr()` overloads avoid
autoboxing.
* **Derived loggers**: `logger.with().attr(...)` creates an immutable
child logger with persistent context — cleaner
  than creating `EventResources` objects.
* **Lazy evaluation**: Lambda-based logging (`log.debug(e -> ...)`)
eliminates `isDebugEnabled` guards entirely.
* **Lombok integration**: `@CustomLog` works with slog, reducing
logger declaration boilerplate.
* **Standalone library**: slog is maintained as an independent
library, decoupled from Pulsar's release cycle.
  Being a separate library makes it easier to adopt across related
projects (e.g., BookKeeper, Oxia Java client
  which already uses slog), enabling consistent structured logging and
cross-project context propagation.

# Goals

## In Scope

* **Adopt slog** as the standard structured logging library for all
Pulsar components.
* **Remove the internal `structured-event-log` module** from the
Pulsar codebase, since slog supersedes it.
* **Migrate all components** from SLF4J to slog, one module at a time.
Components can be migrated independently
  and there is no concern with a coexistence period where some
components use SLF4J and others use slog.
* **Establish logging conventions**: standardized attribute names
(`managedLedger`, `cursor`, `ledgerId`, `topic`,
  `position`, etc.) to ensure consistency across the codebase.
* **Use derived loggers for context propagation**: Classes that carry
identity (e.g., a managed ledger's name, a
  cursor's name, a topic) should create derived loggers that
automatically attach this context to every log call.

* **Cross-boundary context propagation with BookKeeper**: Since Pulsar
embeds BookKeeper, slog's context
  composition (`ctx()`) enables sharing context across the Pulsar →
BookKeeper boundary. For example, a BookKeeper
  write failure event log would automatically carry the Pulsar topic
name as an attribute, without BookKeeper needing
  to know about Pulsar concepts. This dramatically improves
debuggability of storage-layer issues.

## Out of Scope

* Defining centralized event enums (as discussed in PIP-89). slog uses
string messages, and adopting an enum-based
  log event model would be highly impractical given the scale of the codebase.
* Modifying the log output format or Log4j2 configuration. slog works
transparently with existing logging backends
  and configurations.
* Introducing distributed tracing or trace ID propagation. This is
orthogonal and can be layered on top later.

### Pulsar client libraries

The `pulsar-client` module is a special case because its logs are
emitted within users' applications rather than
on the server side. slog auto-discovers the logging backend at runtime
(Log4j2 or SLF4J), so it will integrate
transparently with whatever logging framework the user's application
already has configured. No action is required
by users — the client will simply emit structured log records through
the existing backend, and users who configure
a JSON appender will automatically benefit from the structured attributes.

# High Level Design

The migration follows a straightforward, incremental approach:

1. **Add slog as a dependency** in the Pulsar build system (Gradle
version catalog).
2. **Configure Lombok** with `@CustomLog` support for slog.
3. **Migrate modules one at a time**, starting with `managed-ledger`
as the first module (already done as a
   proof of concept). Each module migration involves:
   - Replacing SLF4J imports and logger declarations with slog.
   - Converting log statements from parameterized format to structured format.
   - Introducing derived loggers for repeated context (e.g., ledger
name, cursor name).
   - Removing `isDebugEnabled()` / `isTraceEnabled()` guards.
4. **Delete the `structured-event-log` module** — it has no existing
usages and can be removed immediately.

Since slog auto-discovers the logging backend (Log4j2 or SLF4J), no
changes to logging configuration files are
needed. The structured attributes are emitted via the backend's native
mechanism (Log4j2 key-value pairs or SLF4J
MDC), so existing log appenders and formats continue to work. Users
who want to leverage structured attributes in
their log output can configure a JSON appender.

# Detailed Design

## Design & Implementation Details

### Logger declaration

For simple classes, use Lombok's `@CustomLog` annotation:

```java
@CustomLog
public class OpAddEntry implements AddCallback {
    // Lombok generates: private static final Logger log =
Logger.get(OpAddEntry.class);

    public void initiate() {
        log.info().attr("ledgerId", ledger.getId()).log("Initiating add entry");
    }
}
```

For classes with persistent identity context (e.g., a managed ledger,
a cursor, a topic handler), create a
derived instance logger:

```java
public class ManagedLedgerImpl implements ManagedLedger {
    private static final Logger slog = Logger.get(ManagedLedgerImpl.class);
    protected final Logger log;

    public ManagedLedgerImpl(..., String name, ...) {
        this.log = slog.with().attr("managedLedger", name).build();
    }

    // All subsequent log calls automatically include managedLedger=<name>
    public void openLedger(long ledgerId) {
        log.info().attr("ledgerId", ledgerId).log("Opening ledger");
        // Emits: msg="Opening ledger" managedLedger=my-topic ledgerId=12345
    }
}
```

### Context composition across components

Classes that operate on behalf of another component can inherit its context:

```java
public class RangeEntryCacheImpl implements EntryCache {
    private static final Logger slog = Logger.get(RangeEntryCacheImpl.class);
    private final Logger log;

    RangeEntryCacheImpl(ManagedLedgerImpl ml, ...) {
        this.log = slog.with().ctx(ml.getLogger()).build();
        // Logger name is RangeEntryCacheImpl, but carries
managedLedger context from ml
    }
}
```

### Cross-boundary context: Pulsar → BookKeeper

One of the most powerful benefits of slog's context model is the
ability to propagate context across component
boundaries. Since Pulsar embeds BookKeeper, a BookKeeper ledger
operation can inherit context from the Pulsar
component that initiated it:

```java
// In Pulsar's ManagedLedgerImpl (already has managedLedger context)
public void createLedger() {
    // Pass the Pulsar logger context to BookKeeper
    bookKeeper.asyncCreateLedger(..., log, ...);
}

// In BookKeeper's LedgerHandle
public class LedgerHandle {
    private final Logger log;

    LedgerHandle(Logger pulsarLogger, ...) {
        // BookKeeper's logger inherits Pulsar's context
        this.log = slog.with().ctx(pulsarLogger).attr("ledgerId",
ledgerId).build();
    }

    void handleWriteError(int rc) {
        // This log entry automatically includes
managedLedger=my-topic from Pulsar's context
        log.error().attr("rc", BKException.getMessage(rc)).log("Write failed");
        // Emits: msg="Write failed" managedLedger=my-topic
ledgerId=12345 rc=NotEnoughBookies
    }
}
```

This eliminates the long-standing problem of BookKeeper errors being
logged without any Pulsar-level context,
making it very difficult to correlate storage failures with the affected topics.

### Conversion patterns

| SLF4J pattern | slog equivalent |
|---|---|
| `log.info("msg")` | `log.info("msg")` |
| `log.info("[{}] msg {}", name, val)` | `log.info().attr("key",
val).log("msg")` |
| `log.error("msg", exception)` | `log.error().exception(e).log("msg")` |
| `log.warn("msg: {}", e.getMessage())` |
`log.warn().exceptionMessage(e).log("msg")` |
| `if (log.isDebugEnabled()) { log.debug(...) }` |
`log.debug().attr(...).log(...)` |
| Manual timing with `System.nanoTime()` | `log.info().timed() ...
event.log("done")` |

### Attribute naming conventions

To ensure consistency across the codebase, the following attribute
names should be used:

| Attribute | Description | Example value |
|---|---|---|
| `managedLedger` | Managed ledger name | `my-tenant/my-ns/my-topic` |
| `cursor` | Cursor name | `my-subscription` |
| `topic` | Topic name | `persistent://tenant/ns/topic` |
| `ledgerId` | BookKeeper ledger ID | `12345` |
| `entryId` | Entry ID within a ledger | `42` |
| `position` | Ledger position (ledgerId:entryId) | `12345:42` |
| `readPosition` | Current read position | `12345:42` |
| `markDeletePosition` | Mark-delete position | `12345:41` |
| `state` | Component state | `Open`, `Closed` |
| `rc` | BookKeeper return code | `OK`, `NoSuchLedger` |
| `subscription` | Subscription name | `my-sub` |
| `consumer` | Consumer name | `consumer-0` |
| `producer` | Producer name | `producer-0` |

## Public-facing Changes

### Configuration

A new Lombok configuration is added to `lombok.config`:

```properties
lombok.log.custom.declaration = io.github.merlimat.slog.Logger
io.github.merlimat.slog.Logger.get(TYPE)
```

### Log output

The log message format changes. With traditional SLF4J:
```
[ManagedLedgerImpl] [my-topic] Opening ledger 12345 with 100 entries
```

With slog (pattern appender):
```
[ManagedLedgerImpl] Opening ledger {managedLedger=my-topic,
ledgerId=12345, entries=100}
```

With slog (Log4j2 JSON appender):
```json
{
  "instant": {"epochSecond": 1775174210, "nanoOfSecond": 277000000},
  "thread": "pulsar-ordered-executor-0-0",
  "level": "INFO",
  "loggerName": "org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl",
  "message": "Opening ledger",
  "contextMap": {
    "managedLedger": "my-topic",
    "ledgerId": "12345",
    "entries": "100"
  },
  "endOfBatch": false,
  "loggerFqcn": "io.github.merlimat.slog.impl.EventImpl",
  "threadId": 42,
  "threadPriority": 5
}
```

Users who rely on parsing specific log message patterns will need to
update their parsers. However, the structured
format makes parsing significantly easier and more reliable.

# Monitoring

No changes to monitoring. Structured logging improves the ability to
create log-based alerts and dashboards by
making attribute-based filtering possible (e.g., alert on all `ERROR`
logs where `managedLedger=X`).

# Security Considerations

No security implications. This is an internal change to how log
messages are formatted. No new endpoints, commands,
or permissions are introduced.

# Backward & Forward Compatibility

## Upgrade

* Log message formats will change as modules are migrated. Users who
parse log messages with regex patterns will
  need to update their parsers. The structured format is strictly more
parseable than the previous format.
* To get the most out of structured logging, users should consider
switching to a JSON log appender if they haven't
  already. However, this is optional — traditional pattern-based
appenders continue to work.
* No configuration changes are required for the upgrade.

## Downgrade / Rollback

No special steps required. Reverting to a previous version restores
the original SLF4J logging.

## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations

No impact on geo-replication. Logging is a local concern and does not
affect inter-cluster communication.

# Alternatives

### Continue using `structured-event-log`

The existing internal module could be extended. However, it lacks
debug/trace levels, has a more verbose API, and
maintaining a custom logging library inside Pulsar adds unnecessary
burden. slog is a superset of the
`structured-event-log` API with better performance and more features.

### Use SLF4J 2.0 Fluent API

SLF4J 2.0 introduced a fluent logging API
(`log.atInfo().addKeyValue("k", v).log("msg")`). However:
* It does not support derived loggers with persistent context.
* It does not support context composition across components.
* No support for timed events or sampling.
* Attribute values are stored as strings only (no primitive overloads).
* Level guards still allocate objects in some implementations.

### Use Log4j2 directly

Log4j2 does not expose a public structured logging API for
applications. While it is possible to pass structured
key-value data through low-level, non-public Log4j2 interfaces (which
is what slog does internally as its
backend), this is not something easily or reliably usable directly
from application code. It would also tightly
couple Pulsar to Log4j2 as the only backend, preventing users from
using other SLF4J-compatible backends.
slog abstracts over these internals and supports both Log4j2 and SLF4J
backends transparently.

### Use Flogger

[Flogger](https://github.com/google/flogger) is Google's fluent
logging API for Java. While it offers a fluent API
and some structured logging features, it has significant performance drawbacks:
* Extremely slow on the enabled path (~0.7 ops/μs vs slog's 13–25
ops/μs — roughly **18–35× slower**).
* Very high allocation rate (1624–1904 B/op vs slog's 0–80 B/op —
**20–24× more allocations**).
* No support for derived loggers with persistent context or context composition.
* No support for timed events.
* Tightly coupled to Google's internal infrastructure; limited backend
flexibility.

### Performance comparison

Benchmarks from the [slog
repository](https://github.com/merlimat/slog) (JMH, higher ops/μs is
better, lower B/op
is better):

**Disabled path** (log level is off — measures the cost of checking the level):

| Library              | ops/μs  |
|----------------------|---------|
| slog                 | 1005.4  |
| Log4j2               | 413.1   |
| SLF4J                | 367.5   |
| Flogger              | 360.3   |

slog's disabled path is **2.4× faster** than Log4j2 and **2.8×
faster** than SLF4J/Flogger. This matters because
the vast majority of debug/trace log statements are disabled in production.

**Enabled path** (log level is on — measures the cost of formatting
and emitting a log message with 2 key-value
attributes):

| Library              | ops/μs |
|----------------------|--------|
| slog (simple)        | 24.6   |
| Log4j2 (simple)      | 14.6   |
| slog (fluent)        | 13.2   |
| slog (fluent+ctx)    | 12.0   |
| SLF4J (simple)       | 11.2   |
| SLF4J (fluent)       | 4.2    |
| Flogger              | 0.7    |

slog is the fastest in both simple and fluent modes. Flogger is
**18–35× slower** than slog.

**Allocation rate** (bytes allocated per log operation):

| Library              | B/op |
|----------------------|------|
| slog (simple)        | 0    |
| Log4j2 (simple)      | 24   |
| SLF4J (simple)       | 24   |
| slog (fluent+ctx)    | 40   |
| slog (fluent)        | 80   |
| SLF4J (fluent)       | 1104 |
| Flogger              | 1624 |

slog's fluent API allocates **6× less** than Log4j2's fluent API,
**14× less** than SLF4J's fluent API, and
**20× less** than Flogger. Since structured logging requires the
fluent API, this is the most relevant comparison
for this PIP's use case.

# General Notes

The `managed-ledger` module has already been fully migrated as a proof
of concept, converting all source and test files
(54 files, ~400 log statements). The migration reduced total lines of
code (1481 additions vs 1392 deletions) while
making every log statement more informative and consistent.

# Links

* Original PIP-89: https://github.com/apache/pulsar/blob/master/pip/pip-89.md
* slog library: https://github.com/merlimat/slog
* PoC: managed-ledger migration: https://github.com/merlimat/pulsar/pull/20

Reply via email to