Jason918 opened a new issue, #16153:
URL: https://github.com/apache/pulsar/issues/16153

   <!---
   Instructions for creating a PIP using this issue template:
   
    1. The author(s) of the proposal will create a GitHub issue ticket using 
this template.
       (Optionally, it can be helpful to send a note discussing the proposal to
       [email protected] mailing list before submitting this GitHub issue. 
This discussion can
       help developers gauge interest in the proposed changes before 
formalizing the proposal.)
    2. The author(s) will send a note to the [email protected] mailing list
       to start the discussion, using subject prefix `[PIP] xxx`. To determine 
the appropriate PIP
       number `xxx`, inspect the mailing list 
(https://lists.apache.org/[email protected])
       for the most recent PIP. Add 1 to that PIP's number to get your PIP's 
number.
    3. Based on the discussion and feedback, some changes might be applied by
       the author(s) to the text of the proposal.
    4. Once some consensus is reached, there will be a vote to formally approve
       the proposal. The vote will be held on the [email protected] 
mailing list. Everyone
       is welcome to vote on the proposal, though it will considered to be 
binding
       only the vote of PMC members. It will be required to have a lazy 
majority of
       at least 3 binding +1s votes. The vote should stay open for at least 48 
hours.
    5. When the vote is closed, if the outcome is positive, the state of the
       proposal is updated and the Pull Requests associated with this proposal 
can
       start to get merged into the master branch.
   
   -->
   
   ## Motivation
   
   The motivation is the same as 
[PIP-63](https://github.com/apache/pulsar/wiki/PIP-63%3A-Readonly-Topic-Ownership-Support),
 with a new use case support for 100K subscriptions broadcast in a single topic.
   1. The bandwidth of a broker limits the number of subscriptions for a single 
topic.
   2. Subscriptions are competing for the network bandwidth on brokers. 
Different subscriptions might have different levels of severity.
   3. When synchronizing cross-city message reading, cross-city access needs to 
be minimized.
   4. [New] Broadcast with 100K subscriptions. There is a limitation of the 
subscription number of a single topic. It's tested that with 40K subscriptions 
of a single topic, the client needs about 20min to start all the connections, 
and under the message producer rate of 1 msg/s, the average end to end latency 
is about 3s.
   
   But it's too complicated to implement with original PIP-63 proposal, the 
changed code is already over 3K+ lines, see #11960. And there are still some 
problems left,
   1. The LAC in readonly topic is updated in a polling pattern, which 
increases the bookie load bookie.
   2. The message data of readonly topic won't be cached in broker. Increase 
the network usage between broker and bookie when there are more than one 
subscriber is tail-reading.
   3. All the subscriptions is managed in original writable-topic, so the 
support max subscription number is not scaleable.
   
   This PIP tries to come up with a simpler solution to support readonly topic 
ownership and solve the problems the previous PR left.
   
   
   ## Goal
   
   The goal is to implement the feature of Shadow Topic to support readonly 
topic ownership.
   A shadow topic is the shadow of some normal persistent topic (let's call it 
source topic here). The source topic and the shadow topic must have the same 
number of partitions or both non-partitioned. Multiply shadow topics can be 
created from a source topic.
   
   Shadow topic share the same ledger id list as its source topic. User can't 
produce any messages to shadow topic directly and shadow topic don't create any 
new ledger for messages, all messages in shadow topic come from source topic.
   
   Shadow topic have its own subscriptions and don't share with its source 
topic. This means the shadow topic have its own cursor ledger to store 
persistent mark-delete info for each persistent subscriptions.
   
   The idea of this solution come from geo-replication, but shares the same 
ledgers to avoid message storage duplication. This is supported by shadow 
replication, which is very like geo-replication, with these difference:
   1. Geo-replication only works between the same topic in different broker 
clusters. But shadow topic have no naming limitation and they can be in the 
same cluster.
   2. Geo-replication duplicates data storage, but shadow topic won't.
   3. Geo-replication replicates data from each other, it's bidirectional, but 
shadow replication only have one way data flow.
   
   
   ## API Changes
   
   1. PulsarApi.proto
   Shadow topic need to know the original message id of the replicated 
messages, in order to update new ledger and lac.
   So we need add a `shadow_message_id` in CommandSend for replicator.
   ```
   message CommandSend {
      // ...
      // message id for shadow topic
      optional MessageIdData shadow_message_id = 9;
   }
   ```
   2. Admin API for creating shadow topic with source topic
   admin.topics().createShadowTopic(source-topic-name, shadow-topic-name)
   
   
   ## Implementation
   <img width="1286" alt="image" 
src="https://user-images.githubusercontent.com/2770146/174714660-81a87b86-137f-4235-a988-da424768a615.png";>
   
   
   There are two key changes for implementation.
   1. How to replicate messages to shadow topics.
   2. How shadow topic manage shared ledgers info.
   
   How to replicate messages to shadow topics.
   This part is mostly implemented by ShadowReplicator, which extends 
PersistentReplicator introduced in geo-replication. The shadow topic list is 
added in the topic policy of the source topic. Source topic manage the 
lifecycle of all the replicators. The key is to add `shadow_message_id` when 
produce message to shadow topics. 
   
   How shadow topic manage shared ledgers info.
   This part is mostly implemented by ShadowManagedLedger, which extends 
current `ManagedLedgerImpl` with two key overrides.
   1. `initialize(..)`: 
        a. Fetch ManagedLedgerInfo of source topic instead of current shadow 
topic. The source topic name is stored in the topic policy of the shadow topic.
        b. Open the last ledger and read the explicit LAC from bookie, instead 
of creating new ledger. Reading LAC here requires that the source topic must 
enable explicit LAC feature by set `bookkeeperExplicitLacIntervalInMills` to 
non-zero value in broker.conf.
        c. Do not start checkLedgerRollTask, which tries roll over ledger 
periodically
   2. `internalAsyncAddEntry()`: instead of write data to bookie, It only 
update metadata of ledgers, like currentLedger, lastConfirmedEntry and put the 
replicated message into cache.
   Besides, some other problems need to be taken care of. For example, 
   - LedgerInfos may be updated in source topic by offloader or ledger 
deletion, Shadow topic needs to watch the ledger info updating with metadata 
store and update in time.
   - Refresh LastAddConfirmed when a managed cursor requests entries beyond 
known LAC.
   
   ## Reject Alternatives
   
   See PIP-63
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to