This is an automated email from the ASF dual-hosted git repository.
mmerli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new 5451921cd49 [improve] PIP-384: ManagedLedger interface decoupling
(#23363)
5451921cd49 is described below
commit 5451921cd49dca03c541617c92ee8a3c83af9e50
Author: Lari Hotari <[email protected]>
AuthorDate: Mon Oct 7 18:37:55 2024 +0300
[improve] PIP-384: ManagedLedger interface decoupling (#23363)
---
pip/pip-384.md | 158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 158 insertions(+)
diff --git a/pip/pip-384.md b/pip/pip-384.md
new file mode 100644
index 00000000000..ba02a147d85
--- /dev/null
+++ b/pip/pip-384.md
@@ -0,0 +1,158 @@
+# PIP-384: ManagedLedger interface decoupling
+
+## Background knowledge
+
+Apache Pulsar uses a component called ManagedLedger to handle persistent
storage of messages.
+
+The ManagedLedger interfaces and implementation were initially tightly
coupled, making it difficult to introduce alternative implementations or
improve the architecture.
+This PIP documents changes that have been made in the master branch for Pulsar
4.0. Pull Requests [#22891](https://github.com/apache/pulsar/pull/22891) and
[#23311](https://github.com/apache/pulsar/pull/23311) have already been merged.
+This work happened after lazy consensus on the dev mailing list based on the
discussion thread ["Preparing for Pulsar 4.0: cleaning up the Managed Ledger
interfaces"](https://lists.apache.org/thread/l5zjq0fb2dscys3rsn6kfl7505tbndlx).
+There is one remaining PR
[#23313](https://github.com/apache/pulsar/pull/23313) at the time of writing
this document.
+The goal of this PIP is to document the changes in this area for later
reference.
+
+Key concepts:
+
+- **ManagedLedger**: A component that handles the persistent storage of
messages in Pulsar.
+- **BookKeeper**: The default storage system used by ManagedLedger.
+- **ManagedLedgerStorage interface**: A factory for configuring and creating
the `ManagedLedgerFactory` instance. [ManagedLedgerStorage.java source
code](https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/storage/ManagedLedgerStorage.java)
+- **ManagedLedgerFactory interface**: Creates and manages ManagedLedger
instances. [ManagedLedgerFactory.java source
code](https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedLedgerFactory.java)
+- **ManagedLedger interface**: Handles the persistent storage of messages in
Pulsar. [ManagedLedger.java source
code](https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedLedger.java)
+- **ManagedCursor interface**: Handles the persistent storage of Pulsar
subscriptions and related message acknowledgements. [ManagedCursor.java source
code](https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedCursor.java)
+
+## Motivation
+
+The current ManagedLedger implementation faces several challenges:
+
+1. **Tight coupling**: The interfaces are tightly coupled with their
implementation, making it difficult to introduce alternative implementations.
+
+2. **Limited flexibility**: The current architecture doesn't allow for easy
integration of different storage systems or optimizations.
+
+3. **Dependency on BookKeeper**: The ManagedLedger implementation is closely
tied to BookKeeper, limiting options for alternative storage solutions.
+
+4. **Complexity**: The tight coupling increases the overall complexity of the
system, making it harder to maintain, test and evolve.
+
+5. **Limited extensibility**: Introducing new features or optimizations often
requires changes to both interfaces and implementations.
+
+## Goals
+
+### In Scope
+
+- Decouple ManagedLedger interfaces from their current implementation.
+- Introduce a ReadOnlyManagedLedger interface.
+- Decouple OpAddEntry and LedgerHandle from ManagedLedgerInterceptor.
+- Enable support for multiple ManagedLedgerFactory instances.
+- Decouple BookKeeper client from ManagedLedgerStorage.
+- Improve overall architecture by reducing coupling between core Pulsar
components and specific ManagedLedger implementations.
+- Prepare the groundwork for alternative ManagedLedger implementations in
Pulsar 4.0.
+
+### Out of Scope
+
+- Implementing alternative ManagedLedger storage backends.
+- Changes to external APIs or behaviors.
+- Comprehensive JavaDocs for the interfaces.
+
+## High Level Design
+
+1. **Decouple interfaces from implementations**:
+ - Move required methods from implementation classes to their respective
interfaces.
+ - Update code to use interfaces instead of concrete implementations.
+
+2. **Introduce ReadOnlyManagedLedger interface**:
+ - Extract this interface to decouple from ReadOnlyManagedLedgerImpl.
+ - Adjust code to use the new interface where appropriate.
+
+3. **Decouple ManagedLedgerInterceptor**:
+ - Introduce AddEntryOperation and LastEntryHandle interfaces.
+ - Adjust ManagedLedgerInterceptor to use these new interfaces.
+
+4. **Enable multiple ManagedLedgerFactory instances**:
+ - Modify ManagedLedgerStorage interface to support multiple "storage
classes".
+ - Implement BookkeeperManagedLedgerStorageClass for BookKeeper support.
+ - Update PulsarService and related classes to support multiple
ManagedLedgerFactory instances.
+ - Add "storage class" to persistence policy part of the namespace level or
topic level policies.
+
+5. **Decouple BookKeeper client**:
+ - Move BookKeeper client creation and management to
BookkeeperManagedLedgerStorageClass.
+ - Update ManagedLedgerStorage interface to remove direct BookKeeper
dependencies.
+
+## Detailed Design
+
+### Interface Decoupling
+
+1. Update ManagedLedger interface:
+ - Add methods from ManagedLedgerImpl to the interface.
+ - Remove dependencies on implementation-specific classes.
+
+2. Update ManagedLedgerFactory interface:
+ - Add necessary methods from ManagedLedgerFactoryImpl.
+ - Remove dependencies on implementation-specific classes.
+
+3. Update ManagedCursor interface:
+ - Add required methods from ManagedCursorImpl.
+ - Remove dependencies on implementation-specific classes.
+
+4. Introduce ReadOnlyManagedLedger interface:
+ - Extract methods specific to read-only operations.
+ - Update relevant code to use this interface where appropriate.
+
+5. Decouple ManagedLedgerInterceptor:
+ - Introduce AddEntryOperation interface for beforeAddEntry method.
+ - Introduce LastEntryHandle interface for
onManagedLedgerLastLedgerInitialize method.
+ - Update ManagedLedgerInterceptor to use these new interfaces.
+
+### Multiple ManagedLedgerFactory Instances
+
+1. Update ManagedLedgerStorage interface:
+ - Add methods to support multiple storage classes.
+ - Introduce getManagedLedgerStorageClass method to retrieve specific
storage implementations.
+
+2. Implement BookkeeperManagedLedgerStorageClass:
+ - Create a new class implementing ManagedLedgerStorageClass for BookKeeper.
+ - Move BookKeeper client creation and management to this class.
+
+3. Update PulsarService and related classes:
+ - Modify to support creation and management of multiple
ManagedLedgerFactory instances.
+ - Update configuration to allow specifying different storage classes for
different namespaces or topics.
+
+### BookKeeper Client Decoupling
+
+1. Update ManagedLedgerStorage interface:
+ - Remove direct dependencies on BookKeeper client.
+ - Introduce methods to interact with storage without exposing BookKeeper
specifics.
+
+2. Implement BookkeeperManagedLedgerStorageClass:
+ - Encapsulate BookKeeper client creation and management.
+ - Implement storage operations using BookKeeper client.
+
+3. Update relevant code:
+ - Replace direct BookKeeper client usage with calls to ManagedLedgerStorage
methods.
+ - Update configuration handling to support BookKeeper-specific settings
through the new storage class.
+
+## Public-facing Changes
+
+### Configuration
+
+- Add new configuration option to specify default ManagedLedger "storage
class" at broker level.
+
+### API Changes
+
+- No major changes to external APIs are planned.
+- The only API change is to add `managedLedgerStorageClassName` to
`PersistencePolicies` which can be used by a custom `ManagedLedgerStorage` to
control the ManagedLedgerFactory instance that is used for a particular
namespace or topic.
+
+## Backward & Forward Compatibility
+
+The changes are internal and don't affect external APIs or behaviors.
+Backward compatibility is fully preserved in Apache Pulsar.
+
+## Security Considerations
+
+The decoupling of interfaces and implementation doesn't introduce new security
concerns.
+
+## Links
+
+- Initial mailing List discussion thread: [Preparing for Pulsar 4.0: cleaning
up the Managed Ledger
interfaces](https://lists.apache.org/thread/l5zjq0fb2dscys3rsn6kfl7505tbndlx)
+ - Merged Pull Request #22891: [Replace dependencies on PositionImpl with
Position interface](https://github.com/apache/pulsar/pull/22891)
+ - Merged Pull Request #23311: [Decouple ManagedLedger interfaces from the
current implementation](https://github.com/apache/pulsar/pull/23311)
+ - Implementation Pull Request #23313: [Decouple Bookkeeper client from
ManagedLedgerStorage and enable multiple ManagedLedgerFactory
instances](https://github.com/apache/pulsar/pull/23313)
+- Mailing List PIP discussion thread:
https://lists.apache.org/thread/rtnktrj7tp5ppog0235t2mf9sxrdpfr8
+- Mailing List PIP voting thread:
https://lists.apache.org/thread/4jj5dmk6jtpq05lcd6dxlkqpn7hov5gv
\ No newline at end of file