> I think that there are still some references to shadow_message_id but
> IIUC we don't need to add that field anymore because we are going to
> use


> optional MessageIdData message_id = 9;


> is this correct ?


Yes, no more `shadow_message_id`. Sorry I missed these references, already
updated.

Thanks,
Haiting

On Fri, Jul 29, 2022 at 5:15 PM Enrico Olivelli <eolive...@gmail.com> wrote:

> Il giorno ven 29 lug 2022 alle ore 09:05 Haiting Jiang
> <jianghait...@apache.org> ha scritto:
> >
> > Hi Enrico,
> >
> > Any further suggestion on this PIP?
> > If not, I would like to raise a  revote on this in a few days.
>
> now the PIP LGTM
>
> I think that there are still some references to shadow_message_id but
> IIUC we don't need to add that field anymore because we are going to
> use
>
> optional MessageIdData message_id = 9;
>
> is this correct ?
>
> Enrico
>
> >
> > Thanks,
> > Haiting
> >
> > On 2022/07/07 11:30:59 Haiting Jiang wrote:
> > > Hi Enrico,
> > >
> > > Thanks for your feedback.
> > >
> > > On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> > > > I have a couple of additional questions.
> > > >
> > > > 1. Security
> > > > What about security permissions about the shadow topic ?
> > > > We are reading from another topic.
> > > > I think we must clarify the decisions in the PIP
> > >
> > > As shadow topic is usually in another namespace, it would have its own
> > > independent permission settings, and we can configure different
> permissions
> > > for source topic and shadow topic. So there would be no guarantee that
> you are
> > > allowed to consume shadow topic if you have permission to consume
> source
> > > topic.
> > >
> > > On the other hand, we uses topic policy to store shadow topic
> settings, so a
> > > new policy permission item needs be added as PolicyName.SHADOW_TOPIC,
> and user
> > > must have PolicyOperation.WRITE to this policy to create/delete shadow
> topics.
> > >
> > > >
> > > > 2. Truncation and deletion
> > > > What happens when you truncate or delete the source topic ?
> > > > please add a paragraph on the PIP
> > > >
> > >
> > > 1. Truncation, from command `bin/pulsar-admin topics truncate
> source-topic`.
> > > For source topic truncation, nothing changes. It still move all
> cursors to the
> > > end of the topic and delete all inactive ledgers.
> > > As shadow topic will watch `ManagedLedgerInfo` in metadata store, once
> it
> > > knows ledgers deleted, all cursors will skip all deleted ledgers.
> > >
> > > 2. Deletion, from command `bin/pulsar-admin topics delete
> source-topic`.
> > > Like geo-replication, topic deletion is forbidden if topic have shadow
> > > replicators, users have to delete shadow topics first. Here is the new
> admin
> > > API for managing shadow topics with source topic in
> > > `org.apache.pulsar.client.admin.Topics` :
> > > ```
> > > void createShadowTopic(String sourceTopicName, String shadowTopicName);
> > > void deleteShadowTopic(String sourceTopicName, String shadowTopicName);
> > > List<String> admin.topics().getShadowTopics(String sourceTopicName);
> > >
> > > //And their async version methods.
> > > ```
> > > And this requires new REST interfaces in admin server, where
> > > ```
> > > PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
> > > METHOD = POST/DELETE/GET;
> > > ```
> > >
> > > > 3. Offloaders
> > > > We are talking about BK metadata, how do Shadow Topics work with
> > > > Offloaded ledgers ?
> > > > Please clarify in the PIP
> > >
> > > Offloading a ledger is a kind of writing operation to topic's
> metadata, so
> > > shadow topic can't offload ledgers to other long term storage.
> However, for
> > > ledgers thats are already offloaded by source topic, it's expected to
> support
> > > reading from offload ledgers in shadow topic, just like read from
> source
> > > topic.
> > >
> > > The implementation depends on shadow topic watching
> `ManagedLedgerInfo` in
> > > metadata store, and if LedgerInfo.offloadContext is updated by source
> topic
> > > offloader, shadow topic can get fully information to get a readHandle
> from
> > > ledgerOffload. And of course, the pre-condition is the shadow topic
> must have
> > > the same offload driver settings.
> > >
> > > >
> > > > 4. Changes in the number of partitions
> > > > the PIP says that the number of partitions must match the source
> topic.
> > > > Are we preventing changes to the number of partitions in the source
> topic ?
> > > >
> > >
> > > No, the updates on partition number will be synced to the shadow topic.
> > > A source topic or partition will be responsible for the creation and
> deletion
> > > of its corresponding shadow topic partitions.
> > >
> > > > 5. Topic stats
> > > > We should add information on the source topic and on the shadow
> topic.
> > > > Please clarify or draft your intentions in the PIP
> > > >
> > >
> > > For topic stats on source topic, as shadow replicator will reuse most
> of current
> > > PersistentReplicator, the ReplicatorStatsImpl also can be applied to
> shadow replicators.
> > > And we need to add a new field in `TopicStatsImpl` like
> geo-replication:
> > > ```
> > > Map<String /*shadow topic name*/, ReplicatorStatsImpl>
> shadowReplication;
> > > ```
> > >
> > > As for topic stats on shadow topic, previous `TopicStatsImpl` still
> applies.
> > > And I don't see any other stats need to be added at this point.
> > >
> > >
> > > > 6. GeoReplication
> > > > I guess that GeoReplication will not be possible for shadow topics.
> > > > Please clarify on the PIP
> > > >
> > >
> > > Yes, this is decided by the nature of shadow topic that it don't have
> the write access to BK.
> > > And I don't see the necessary of supporting GeoReplication for shadow
> topics.
> > > We can make source topic geo-replicated and create the same shadow
> topic in each clusters.
> > >
> > > > I believe that this feature is very powerful, but we must design it
> > > > carefully and discuss
> > > > about all the edge cases. Otherwise we will end up in something that
> > > > is half-baked
> > > > and we will have to resolve edge cases while developing or after
> going
> > > > to production.
> > > >
> > > > Every feature must be fully integrated with the rest of Pulsar
> > > >
> > > > Enrico
> > > >
> > > > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> > > > <jianghait...@apache.org> ha scritto:
> > > > >
> > > > > Hi Penghui
> > > > >
> > > > > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > > > > Hi Haiting,
> > > > > >
> > > > > > Thanks for the explanation. I'm clear for now.
> > > > > >
> > > > > > Pulsar functions also can do such things by connecting data from
> one topic
> > > > > > to another topic.
> > > > > > But the difference is this proposal only copies the data to the
> cache of
> > > > > > another topic, and the data not
> > > > > > in the cache is also available by reading from ledgers.
> > > > > >
> > > > > > And this approach also follows benefits compared with
> replicating data to
> > > > > > multiple "real" topics.
> > > > > >
> > > > > > - reuse the topic metadata
> > > > > > - the same message ID which easy for troubleshooting
> > > > > >
> > > > > > Just one question
> > > > > >
> > > > > > >>>>>>>
> > > > > > ```
> > > > > > message CommandSend { // ... // message id for shadow topic
> optional
> > > > > >    MessageIdData shadow_message_id = 9; }
> > > > > > ```
> > > > > >
> > > > > > Can we get the message ID from the replicated data to avoid
> introducing a
> > > > > > new command?
> > > > > > Or use a marker message to avoid broker-to-broker directly
> protobuf command
> > > > > > interaction.
> > > > > >
> > > > > Sorry for not wrote it clearly. CommandSend is not a new command.
> It's exactly the main
> > > > > command producer used to send message to broker. The only change
> is add a new field in it.
> > > > > The whole command proto would be like this:
> > > > > ```
> > > > > message CommandSend {
> > > > >     required uint64 producer_id = 1;
> > > > >     required uint64 sequence_id = 2;
> > > > >     optional int32 num_messages = 3 [default = 1];
> > > > >     optional uint64 txnid_least_bits = 4 [default = 0];
> > > > >     optional uint64 txnid_most_bits = 5 [default = 0];
> > > > >
> > > > >     /// Add highest sequence id to support batch message with
> external sequence id
> > > > >     optional uint64 highest_sequence_id = 6 [default = 0];
> > > > >     optional bool is_chunk     =7 [default = false];
> > > > >
> > > > >     // Specify if the message being published is a Pulsar marker
> or not
> > > > >     optional bool marker = 8 [default = false];
> > > > >
> > > > >     // message id for shadow topic
> > > > >     optional MessageIdData shadow_message_id = 9;
> > > > > }
> > > > > ```
> > > > > So there won't be any broker-to-broker directly protobuf command
> interactions.
> > > > >
> > > > > Thanks,
> > > > > Haiting
> > > > >
> > > > > > Thanks,
> > > > > > Penghui
> > > > > >
> > > > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <
> jianghait...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Penghui & Asaf:
> > > > > > >
> > > > > > > Please allow me to provide some more detailes about
> **metadata**
> > > > > > > synchronization
> > > > > > > between source topic and shadow topic.
> > > > > > >
> > > > > > > 1.When shadow topic initializes, it will read from metadata
> store path
> > > > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains
> all the
> > > > > > > managed ledger info. We don't
> > > > > > > need to read the  ledger information from source topic broker.
> > > > > > >
> > > > > > > 2. When shadow topic received new message from replicator, if
> the ledger
> > > > > > > id of the message
> > > > > > >  is the same as the last ledger, it just updates the LAC. If
> not, it will
> > > > > > > update ledger list from metadata,
> > > > > > > and then open the new ledger handle and update the LAC.
> > > > > > >
> > > > > > > As for the copy itself and add shadow message id in
> CommandSend, it mostly
> > > > > > > serves the purpose
> > > > > > > of filling the EntryCache.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Haiting
> > > > > > >
> > > > > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > > > > One question comes to mind here: Why not simply read the
> ledger
> > > > > > > information
> > > > > > > > from original topic, without copy?
> > > > > > > >
> > > > > > > > I think this is a good idea.
> > > > > > > >
> > > > > > > > Penghui
> > > > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote:
> > > > > > > > >
> > > > > > > > > One question comes to mind here: Why not simply read the
> ledger
> > > > > > > information
> > > > > > > > > from original topic, without copy?
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > > BR,
> > > Haiting
> > >
>

Reply via email to