poorbarcode commented on code in PR #20493: URL: https://github.com/apache/pulsar/pull/20493#discussion_r1218143352
########## pip/pip-274.md: ########## @@ -0,0 +1,126 @@ +# Background knowledge + +Apache Pulsar is a distributed messaging system that supports multiple messaging protocols and storage methods. Among them, Pulsar Topic Compaction is a mechanism to clean up duplicate messages in topics to reduce storage space and improve system efficiency. +More topic compaction details can be found in [Pulsar Topic Compaction](https://pulsar.apache.org/docs/en/concepts-topic-compaction/). + +# Motivation + +Currently, the implementation of Pulsar Topic Compaction is fixed and does not support custom strategy, which limits users from using more Compactor policies in their applications. + + +For example, we need to parse the Kafka format then compact message in Kop, but the current implementation of Pulsar topic compaction does not support this feature. +Another the topic compaction logic implemented in `TwoPhaseCompactor` only compacts messages to the last one, but sometimes we need to keep the first valid message e.g [`StrategicTwoPhaseCompactor`](https://github.com/coderzc/pulsar/blob/0e9935c493060b13b322a84c5418146423992369/pulsar-broker/src/main/java/org/apache/pulsar/compaction/StrategicTwoPhaseCompactor.java). + +So we need to make the topic compactor pluggable to support more compaction strategy. + +# Goals + +## In Scope + +<!-- +What this PIP intend to achieve once It's integrated into Pulsar. +Why does it benefit Pulsar. +--> + +Make the compactor pluggable. + +## Out of Scope + +<!-- +Describe what you have decided to keep out of scope, perhaps left for a different PIP/s. +--> + + +# High Level Design + +<!-- +Describe the design of your solution in *high level*. +Describe the solution end to end, from a birds-eye view. +Don't go into implementation details in this section. + +I should be able to finish reading from beginning of the PIP to here (including) and understand the feature and +how you intend to solve it, end to end. + +DON'T +* Avoid code snippets, unless it's essential to explain your intent. +--> + +Make the topic compactor pluggable, users can customize the compactor implementation according to their own special scenarios. + + +# Detailed Design + +## Design & Implementation Details + +<!-- +This is the section where you dive into the details. It can be: +* Concrete class names and their roles and responsibility, including methods. +* Code snippets of existing code. +* Interface names and its methods. +* ... +--> +* Define a standard Compactor interface that specifies the methods and properties that the Compactor implementation needs to implement. This interface should include methods for Compactor initialization, Compactor execution, and getting Compactor stats. +```java +public interface Compactor { + + void initialize(ServiceConfiguration conf, + PulsarClient pulsar, + BookKeeper bk, + ScheduledExecutorService scheduler); + + CompletableFuture<Long> compact(String topic); + + CompactorMXBean getStats(); +} +``` + +* Rename `org.apache.pulsar.compaction.Compactor` to `org.apache.pulsar.compaction.AbstractCompactor` and make it implement `Compactor` interface. + +* Load custom compactor based on configuration in `org.apache.pulsar.broker.PulsarService.newCompactor` and `CompactorTool`. + +## Public-facing Changes + +<!-- +Describe the additions you plan to make for each public facing component. +Remove the sections you are not changing. +Clearly mark any changes which are BREAKING backward compatability. +--> + + +### Configuration Review Comment: If I want `namespace A` to apply policy `TwoPhaseCompactor`, but `namespace B` applies policy `KafkaCompactor`, how can I configure it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
