Jason918 opened a new issue, #16153:
URL: https://github.com/apache/pulsar/issues/16153
<!---
Instructions for creating a PIP using this issue template:
1. The author(s) of the proposal will create a GitHub issue ticket using
this template.
(Optionally, it can be helpful to send a note discussing the proposal to
[email protected] mailing list before submitting this GitHub issue.
This discussion can
help developers gauge interest in the proposed changes before
formalizing the proposal.)
2. The author(s) will send a note to the [email protected] mailing list
to start the discussion, using subject prefix `[PIP] xxx`. To determine
the appropriate PIP
number `xxx`, inspect the mailing list
(https://lists.apache.org/[email protected])
for the most recent PIP. Add 1 to that PIP's number to get your PIP's
number.
3. Based on the discussion and feedback, some changes might be applied by
the author(s) to the text of the proposal.
4. Once some consensus is reached, there will be a vote to formally approve
the proposal. The vote will be held on the [email protected]
mailing list. Everyone
is welcome to vote on the proposal, though it will considered to be
binding
only the vote of PMC members. It will be required to have a lazy
majority of
at least 3 binding +1s votes. The vote should stay open for at least 48
hours.
5. When the vote is closed, if the outcome is positive, the state of the
proposal is updated and the Pull Requests associated with this proposal
can
start to get merged into the master branch.
-->
## Motivation
The motivation is the same as
[PIP-63](https://github.com/apache/pulsar/wiki/PIP-63%3A-Readonly-Topic-Ownership-Support),
with a new use case support for 100K subscriptions broadcast in a single topic.
1. The bandwidth of a broker limits the number of subscriptions for a single
topic.
2. Subscriptions are competing for the network bandwidth on brokers.
Different subscriptions might have different levels of severity.
3. When synchronizing cross-city message reading, cross-city access needs to
be minimized.
4. [New] Broadcast with 100K subscriptions. There is a limitation of the
subscription number of a single topic. It's tested that with 40K subscriptions
of a single topic, the client needs about 20min to start all the connections,
and under the message producer rate of 1 msg/s, the average end to end latency
is about 3s.
But it's too complicated to implement with original PIP-63 proposal, the
changed code is already over 3K+ lines, see #11960. And there are still some
problems left,
1. The LAC in readonly topic is updated in a polling pattern, which
increases the bookie load bookie.
2. The message data of readonly topic won't be cached in broker. Increase
the network usage between broker and bookie when there are more than one
subscriber is tail-reading.
3. All the subscriptions is managed in original writable-topic, so the
support max subscription number is not scaleable.
This PIP tries to come up with a simpler solution to support readonly topic
ownership and solve the problems the previous PR left.
## Goal
The goal is to implement the feature of Shadow Topic to support readonly
topic ownership.
A shadow topic is the shadow of some normal persistent topic (let's call it
source topic here). The source topic and the shadow topic must have the same
number of partitions or both non-partitioned. Multiply shadow topics can be
created from a source topic.
Shadow topic share the same ledger id list as its source topic. User can't
produce any messages to shadow topic directly and shadow topic don't create any
new ledger for messages, all messages in shadow topic come from source topic.
Shadow topic have its own subscriptions and don't share with its source
topic. This means the shadow topic have its own cursor ledger to store
persistent mark-delete info for each persistent subscriptions.
The idea of this solution come from geo-replication, but shares the same
ledgers to avoid message storage duplication. This is supported by shadow
replication, which is very like geo-replication, with these difference:
1. Geo-replication only works between the same topic in different broker
clusters. But shadow topic have no naming limitation and they can be in the
same cluster.
2. Geo-replication duplicates data storage, but shadow topic won't.
3. Geo-replication replicates data from each other, it's bidirectional, but
shadow replication only have one way data flow.
## API Changes
1. PulsarApi.proto
Shadow topic need to know the original message id of the replicated
messages, in order to update new ledger and lac.
So we need add a `shadow_message_id` in CommandSend for replicator.
```
message CommandSend {
// ...
// message id for shadow topic
optional MessageIdData shadow_message_id = 9;
}
```
2. Admin API for creating shadow topic with source topic
admin.topics().createShadowTopic(source-topic-name, shadow-topic-name)
## Implementation
<img width="1286" alt="image"
src="https://user-images.githubusercontent.com/2770146/174714660-81a87b86-137f-4235-a988-da424768a615.png">
There are two key changes for implementation.
1. How to replicate messages to shadow topics.
2. How shadow topic manage shared ledgers info.
How to replicate messages to shadow topics.
This part is mostly implemented by ShadowReplicator, which extends
PersistentReplicator introduced in geo-replication. The shadow topic list is
added in the topic policy of the source topic. Source topic manage the
lifecycle of all the replicators. The key is to add `shadow_message_id` when
produce message to shadow topics.
How shadow topic manage shared ledgers info.
This part is mostly implemented by ShadowManagedLedger, which extends
current `ManagedLedgerImpl` with two key overrides.
1. `initialize(..)`:
a. Fetch ManagedLedgerInfo of source topic instead of current shadow
topic. The source topic name is stored in the topic policy of the shadow topic.
b. Open the last ledger and read the explicit LAC from bookie, instead
of creating new ledger. Reading LAC here requires that the source topic must
enable explicit LAC feature by set `bookkeeperExplicitLacIntervalInMills` to
non-zero value in broker.conf.
c. Do not start checkLedgerRollTask, which tries roll over ledger
periodically
2. `internalAsyncAddEntry()`: instead of write data to bookie, It only
update metadata of ledgers, like currentLedger, lastConfirmedEntry and put the
replicated message into cache.
Besides, some other problems need to be taken care of. For example,
- LedgerInfos may be updated in source topic by offloader or ledger
deletion, Shadow topic needs to watch the ledger info updating with metadata
store and update in time.
- Refresh LastAddConfirmed when a managed cursor requests entries beyond
known LAC.
## Reject Alternatives
See PIP-63
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]