On Aug 3, 2012, at 10:01 PM, Aniruddha Laud wrote:

> Hi Flavio,
> 
> Please find my responses inline.
> 
> On Fri, Aug 3, 2012 at 3:28 AM, Flavio Junqueira <[email protected]> wrote:
> 
>> Hi Aniruddha, We had an offline discussion about your question and we have
>> a few clarification questions back:
>> 
>> On Aug 1, 2012, at 3:48 AM, Aniruddha Laud wrote:
>> 
>>> Currently, all cross-region subscriptions have to succeed for a
>>> subscription to succeed locally. I understand that we might lose messages
>>> published in another region if this doesn't succeed the very first time,
>> 
>> Very first time means receiving at least confirmation from at least one
>> region? How do we decide that it is ok to acknowledge locally that a
>> subscription went through?
>> 
> Say we get a subscription in region ABC from subscriber id SUB-1 for topic
> FOO-1, as of now, the hub responsible for FOO-1 will try to remotely
> subscribe to all regions for that topic and the local subscription will
> succeed only after all remote subscriptions succeed. Say, the regions are
> ABC, PQR and XYZ, region XYZ will have a subscription for topic FOO-1 from
> subscriber id __ABC. Same with region PQR. If any of the remote
> subscriptions fail, this local subscription should fail because we cannot
> guarantee that topics published in all remote regions will be delivered.

Sounds right. I think you're essentially saying that it is sufficient to have 
one live subscription between any pair of regions. 

> Let's assume that all remote subscriptions succeeded. Now, region ABC goes
> down temporarily because of a network partition or because the hubs are
> being restarted. If at this point, we get a subscription from subscriber
> SUB-2 for topic FOO-1 to the same hub, we need not subscribe to the remote
> regions as a subscription already exists and is being delivered to the
> HubClient on this hub. Moreover, any subsequent subscription to that topic
> need not wait for acknowledgements from remote regions because as far as
> regions PQR and XYZ go, they have a subscription from subscriber id __ABC
> already and all messages published in those regions are guaranteed to be
> delivered to this subscription.

I can't remember now, do we keep information about consumed messages locally to 
a subscriber even for messages published remotely? If so, then your observation 
seems right to me.

> 
>>> but subsequent subscribe requests should not be affected as the other
>>> regions are already aware of the remote subscription (that succeeded the
>>> first time).
>> 
>> Subscribe requests here means requests from clients local of the same
>> region, correct?
>> 
> Yes.
> 
>> 
>>> A possible fix would be to store information in the local
>>> zookeeper about existing remote subscriptions.
>> 
>> Store what information exactly? Does this information prevent concurrently
>> published messages from being lost? If we don't wait for all regions to
>> acknowledge, then we might end up violating the contract that says that
>> once a client successfully subscribes to a topic, then delivery from that
>> point on is guaranteed.
>> 
> For every topic node in ZK,we have a child node named remote which stores
> the remote regions for which subscriptions have succeeded. We can make use
> of the onLastLocalUnsubscribe function in RegionManager to clear the state
> and the onFirstLocalSubscribe to check-and-set it. We need to store this in
> ZK because this information has to be made available to all Hubs even as
> topics move around locally. I don't think it would violate any existing
> contracts. You are still guaranteed delivery of messages in all regions.

Got it, makes sense. I believe false suspicions of hub crashes could cause 
concurrent accesses to the same znode. If so, then we need to check that znode 
version matches the one read to prevent concurrent accesses from causing 
inconsistencies. Is this right?

-Flavio

Reply via email to