Copilot commented on code in PR #24377:
URL: https://github.com/apache/pulsar/pull/24377#discussion_r2366953481
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
Review Comment:
Define the semantics precisely: does the index count individual messages
within batches or per-entry? Is it scoped per-partition or across a partitioned
topic? Are control/transaction/compaction markers included? How do retention,
deletion, and offload affect continuity? Clarifying these is critical for
correctness and interoperability.
```suggestion
- **Message Index**: A monotonically increasing counter (0-based) that
uniquely identifies each user message within a single partition of a topic.
- The index counts individual user messages, not batch entries or broker
control/transaction/compaction markers.
- The index is scoped per partition; each partition maintains its own
independent sequence of indexes.
- Retention, deletion, and offload do not affect the continuity of the
index: once assigned, indexes are never reused, and gaps may exist if messages
are deleted or offloaded.
```
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size,
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry
metadata** (PIP-415), but custom ManagedLedger
+implementations (like StreamNative's Ursa engine) may not use broker entry
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger**
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger`
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+ CompletableFuture<MessageId> getMessageIdByIndex(long index);
Review Comment:
ManagedLedger is an internal storage-layer API that typically uses
Position/PositionImpl, not the client-facing MessageId type. Returning
MessageId introduces a layering dependency on the client API and may not fit
custom engines; prefer returning Position (or an internal identifier) and let
higher layers translate to MessageId. Suggested signature:
CompletableFuture<Position> getPositionByIndex(long index).
```suggestion
CompletableFuture<Position> getPositionByIndex(long index);
```
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size,
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry
metadata** (PIP-415), but custom ManagedLedger
+implementations (like StreamNative's Ursa engine) may not use broker entry
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger**
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger`
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+ CompletableFuture<MessageId> getMessageIdByIndex(long index);
+}
+```
+
+### Public API
+
+### Binary protocol
+
+### Configuration
+
+### CLI
+
+### Metrics
+
+# Backward & Forward Compatibility
+
Review Comment:
This section is empty. Please document how adding a new method to the
ManagedLedger interface will avoid breaking existing implementations (e.g.,
introduce a default method with UnsupportedOperationException, or a new
sub-interface like IndexedManagedLedger and capability detection), and outline
migration steps.
```suggestion
## Avoiding Breaking Changes
Adding a new method to the `ManagedLedger` interface can break existing
implementations that do not provide an implementation for the new method. To
avoid this:
- **Default Method:** The new `getMessageIdByIndex(long index)` method will
be added as a `default` method in the interface, which throws
`UnsupportedOperationException` by default. This ensures that existing
implementations will continue to compile and run, but will throw an exception
if the new method is called and not overridden.
```java
default CompletableFuture<MessageId> getMessageIdByIndex(long index) {
throw new UnsupportedOperationException("getMessageIdByIndex is not
implemented");
}
```
- **Alternative: Sub-interface:** Alternatively, a new sub-interface (e.g.,
`IndexedManagedLedger`) can be introduced for implementations that support this
feature. Capability detection can be used to check if an instance supports the
new method.
## Migration Steps
1. **Update Interface:** Add the default method to the `ManagedLedger`
interface.
2. **Update Implementations:** Implementers of `ManagedLedger` can override
the new method to provide the actual logic. Existing implementations that do
not override the method will throw `UnsupportedOperationException` if the
method is called.
3. **Client Code:** Client code should check for
`UnsupportedOperationException` when calling the new method, or use capability
detection if a sub-interface is used.
```
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size,
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry
metadata** (PIP-415), but custom ManagedLedger
+implementations (like StreamNative's Ursa engine) may not use broker entry
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger**
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger`
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+ CompletableFuture<MessageId> getMessageIdByIndex(long index);
+}
+```
+
+### Public API
+
+### Binary protocol
+
+### Configuration
+
+### CLI
+
+### Metrics
+
+# Backward & Forward Compatibility
+
+# Links
+
+* Mailing List discussion thread:
+* Mailing List voting thread:
Review Comment:
Please add the required discussion and voting thread links to comply with
the PIP process.
```suggestion
* Mailing List discussion thread:
https://lists.apache.org/thread/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
* Mailing List voting thread:
https://lists.apache.org/thread/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
```
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size,
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry
metadata** (PIP-415), but custom ManagedLedger
+implementations (like StreamNative's Ursa engine) may not use broker entry
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger**
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger`
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
Review Comment:
The design sections are too sparse. Please outline how index→position
mapping will be built and queried efficiently (e.g., ledger-level index ranges
cached in memory and persisted in LedgerInfo for O(log N) lookup), expected
complexity, memory overhead, impact on writes/rollovers, and recovery strategy
(building/rebuilding index metadata after failures).
```suggestion
### Index→Position Mapping
- **Index Range Tracking:**
Each BookKeeper ledger managed by ManagedLedger will be associated with a
contiguous range of message indexes, e.g., `[startIndex, endIndex)`. These
ranges will be tracked in memory and persisted as part of the `LedgerInfo`
metadata for each ledger.
- **LedgerInfo Extension:**
The `LedgerInfo` structure will be extended to include the starting global
message index for the ledger. The end index can be inferred from the next
ledger's start index or the current highest index.
- **In-Memory Cache:**
On startup, ManagedLedger will load all `LedgerInfo` objects and build an
in-memory sorted list (or array) of `(startIndex, ledgerId)` pairs. This
enables efficient binary search (O(log N), where N is the number of ledgers) to
map a global message index to the corresponding ledger.
- **Querying:**
To resolve `getMessageIdByIndex(index)`, perform a binary search over the
in-memory index range list to find the ledger containing the index, then
compute the entryId within the ledger as `entryId = index - startIndex`. The
resulting `MessageId` is `(ledgerId, entryId)`.
- **Complexity:**
- Lookup: O(log N) for binary search over ledgers, O(1) arithmetic for
entryId.
- Memory: O(N) for the in-memory index range list, where N is the number
of ledgers (typically small).
- **Impact on Writes/Rollovers:**
- On ledger rollover, the new ledger's `LedgerInfo` is created with the
next available global index as its start index.
- The in-memory index range list is updated atomically with the new ledger.
- No per-message overhead is introduced; only per-ledger metadata is
updated.
- **Persistence:**
- The start index for each ledger is persisted in `LedgerInfo` in the
metadata store (e.g., ZooKeeper).
- This ensures that the mapping can be reconstructed after broker restarts
or failures.
- **Recovery Strategy:**
- On recovery, the broker loads all `LedgerInfo` objects and reconstructs
the in-memory index range list.
- If index metadata is missing or inconsistent (e.g., after an unclean
shutdown), the system can scan the ledgers sequentially, counting entries to
reconstruct the correct index ranges, and update the `LedgerInfo` accordingly.
- **Custom Engine Support:**
- Alternative `ManagedLedger` implementations (e.g., Ursa) can provide
their own efficient mapping and recovery strategies, but must expose the same
API and semantics.
```
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size,
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry
metadata** (PIP-415), but custom ManagedLedger
+implementations (like StreamNative's Ursa engine) may not use broker entry
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger**
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
+
+# High Level Design
+
+1. **Custom Engine Support**: Alternative implementations via `ManagedLedger`
interface
+
+# Detailed Design
+
+## Design & Implementation Details
+
+## Public-facing Changes
+
+**1. `ManagedLedger` Interface Change**
+
+```java
+public interface ManagedLedger {
+ CompletableFuture<MessageId> getMessageIdByIndex(long index);
+}
+```
+
+### Public API
+
+### Binary protocol
+
+### Configuration
+
+### CLI
+
+### Metrics
+
Review Comment:
These public-facing sections are placeholders with no content. Either
explicitly state 'No changes' for each area or document the expected impacts
(e.g., any new exceptions, client API mapping, metrics for index lookups, and
whether CLI/admin endpoints will expose this capability).
```suggestion
No changes
### Binary protocol
No changes
### Configuration
No changes
### CLI
No changes
### Metrics
No changes
```
##########
pip/pip-424.md:
##########
@@ -0,0 +1,59 @@
+# PIP-424: Support Getting Message ID by Index in ManagedLedger Interface
+
+# Background knowledge
+
+- **ManagedLedger**: Core Pulsar component managing message storage in
BookKeeper ledgers
+- **Message Index**: Global monotonically increasing counter for messages in a
topic (0-based)
+- **LedgerInfo**: Metadata stored per ledger in ZK/metastore (ledgerID, size,
etc.)
+
+# Motivation
+
+Current approaches to get message ID by index rely on **broker entry
metadata** (PIP-415), but custom ManagedLedger
+implementations (like StreamNative's Ursa engine) may not use broker entry
metadata for tracking message indexes.
+
+So it is necessary to add `getMessageIdByIndex()` in the **ManagedLedger**
interface directly.
+
+# Goals
+
+## In Scope
+
+- Add `getMessageIdByIndex()` to `ManagedLedger`
+
+## Out of Scope
+
+NA
Review Comment:
Use 'N/A' instead of 'NA' for consistency and clarity, or list specific
out-of-scope items.
```suggestion
N/A
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]