This is an automated email from the ASF dual-hosted git repository.

lhotari pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 9d0292ebb03 [improve][pip] PIP-352: Event time based topic compactor 
(#22710)
9d0292ebb03 is described below

commit 9d0292ebb034a624286a9ffdf992bb00085190e4
Author: Marek Czajkowski <[email protected]>
AuthorDate: Wed Jul 31 14:26:02 2024 +0200

    [improve][pip] PIP-352: Event time based topic compactor (#22710)
---
 pip/pip-352.md | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/pip/pip-352.md b/pip/pip-352.md
new file mode 100644
index 00000000000..31641e7e1e1
--- /dev/null
+++ b/pip/pip-352.md
@@ -0,0 +1,68 @@
+# PIP-352: Event time based topic compactor
+
+# Background knowledge
+
+Pulsar Topic Compaction provides a key-based data retention mechanism that 
allows you only to keep the most recent message associated with that key to 
reduce storage space and improve system efficiency.
+
+Another Pulsar's internal use case, the Topic Compaction of the new load 
balancer, changed the strategy of compaction. It only keeps the first value of 
the key. For more detail, see 
[PIP-215](https://github.com/apache/pulsar/issues/18099).
+
+There is also plugable topic compaction service present. For more details, see 
[PIP-278](https://github.com/apache/pulsar/pull/20624) 
+
+More topic compaction details can be found in [Pulsar Topic 
Compaction](https://pulsar.apache.org/docs/en/concepts-topic-compaction/).
+
+# Motivation
+
+Currently, there are two types of compactors
+available: `TwoPhaseCompactor` and `StrategicTwoPhaseCompactor`. The latter
+is specifically utilized for internal load balancing purposes and is not
+employed for regular compaction of Pulsar topics. On the other hand, the
+former can be configured via `CompactionServiceFactory` in the
+`broker.conf`.
+
+I believe it could be advantageous to introduce another type of topic
+compactor that operates based on event time. Such a compactor would have
+the capability to maintain desired messages within the topic while
+preserving the order expected by external applications. Although
+applications may send messages with the current event time, variations in
+network conditions or redeliveries could result in messages being stored in
+the Pulsar topic in a different order than intended. Implementing event
+time-based checks could mitigate this inconvenience.
+
+# Goals
+* No impact on current topic compation behavior
+* Preserve the order of messages during compaction regardless of network 
latencies 
+
+## In Scope
+* Abstract TwoPhaseCompactor 
+
+* Migrate the current implementation to a new abstraction
+
+* Introduce new compactor based on event time
+
+* Makes existing tests compatible with new implementations.
+
+
+# High Level Design
+
+In order to change the way topic is compacted we need to create 
`EventTimeCompactionServiceFactory`. This service provides a new 
+compactor `EventTimeOrderCompactor` which has a logic similar to existing 
`TwoPhaseCompactor` with a slightly change in algorithm responsible for
+deciding which message is outdated.
+
+New compaction service factory can be enabled via 
`compactionServiceFactoryClassName`
+
+# Detailed Design
+
+## Design & Implementation Details
+
+* Abstract `TwoPhaseCompactor` and move current logic to new 
`PublishingOrderCompactor`
+
+* Implement `EventTimeCompactionServiceFactory` and `EventTimeOrderCompactor`
+
+* Create `MessageCompactionData` as a holder for compaction related data
+
+Example implementation can be found 
[here](https://github.com/apache/pulsar/pull/22517/files)
+
+# Links
+
+* Mailing List discussion thread: 
https://lists.apache.org/thread/nc8r3tm9xv03vl30zrmfhd19q2k308y2
+* Mailing List voting thread: 
https://lists.apache.org/thread/pp6c0qqw51yjw9szsnl2jbgjsqrx7wkn

Reply via email to