Re: [Architecture] Cluster message replay

Afkham Azeez Thu, 04 Jul 2013 18:05:30 -0700

On Fri, Jul 5, 2013 at 12:24 AM, Pulasthi Supun <[email protected]> wrote:


> Hi,
>
>
> On Thu, Jul 4, 2013 at 8:00 PM, Paul Fremantle <[email protected]> wrote:
>
>> Some systems (e.g. MQTT) allow for the last message on any given topic to
>> be retained and then replayed to any new subscriber. This model can work
>> well: it encourages the design of a topic space where you just need to know
>> the last message (i.e. the latest state of that particular resource).
>>
>> Paul
>>
>>
>> On 4 July 2013 13:14, Afkham Azeez <[email protected]> wrote:
>>
>>> There could be a situation where when a cluster message is sent, a
>>> member momentarily leaves the cluster, but joins immediately. This
>>> generally could happen when the nodes slow down under load, or due to
>>> intermittent network failures. However, this could lead to failures because
>>> crucial cluster messages may not be received by members.
>>>
>>> To overcome this, or reduce the probability of loss of such messages, we
>>> can replay a certain number of messages when a member joins. On the sender
>>> side, messages over a particular time period can be buffered, and then
>>> replayed when new members join. However, we should ensure that the messages
>>> are idempotent, and messages should declare whether they are idempotent or
>>> not. If a message is not idempotent, we will not replay it. All the
>>> messages we have at the moment are idempotent, AFAIK.
>>>
>>> How does this approach sound?
>>>
>>
> +1 . In the new Hazelcast based implementation is a member
> that momentarily leaves the cluster and joins back treated as new member
> like in the tribes based implementation?
>

Yeah, that is how unreliable failure detection normally works, and all real
world group management systems use unreliable failure detection.


>
> If this is not the case ( that is if there is some information kept about
> the member that leaves and joins back ), how about saving some information
> like the last message that was processed by the member that way i think the
> number of messages that need to be replayed will be reduced. not sure if
> this is worth the trouble performance wise even if its possible  :) :)
>

If a Carbon member leaves, and rejoins within about 5 seconds, we can
assume that it was a momentary failure because  Carbon startup normally
takes at least 10s. If it takes longer, we could assume that the member
actually left & rejoined. However, these assumptions don't hold under all
conditions. On a very fast computer, Carbon may start under 5s, and there
could be a prolonged failure where it could take more than 30s to detect
that a member rejoined. So, the safest would be to replay the messages, and
handle duplicates.

>
> Regards,
> Pulasthi
>
>>
>>>
>>> --
>>> *Afkham Azeez*
>>> Director of Architecture; WSO2, Inc.; http://wso2.com
>>> Member; Apache Software Foundation; http://www.apache.org/
>>> * <http://www.apache.org/>**
>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>>> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>>> *
>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>>> *
>>> *
>>> *Lean . Enterprise . Middleware*
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Paul Fremantle
>> CTO and Co-Founder, WSO2
>> OASIS WS-RX TC Co-chair, VP, Apache Synapse
>>
>> UK: +44 207 096 0336
>> US: +1 646 595 7614
>>
>> blog: http://pzf.fremantle.org
>> twitter.com/pzfreo
>> [email protected]
>>
>> wso2.com Lean Enterprise Middleware
>>
>> Disclaimer: This communication may contain privileged or other
>> confidential information and is intended exclusively for the addressee/s.
>> If you are not the intended recipient/s, or believe that you may have
>> received this communication in error, please reply to the sender indicating
>> that fact and delete the copy you received and in addition, you should not
>> print, copy, retransmit, disseminate, or otherwise use the information
>> contained in this communication. Internet communications cannot be
>> guaranteed to be timely, secure, error or virus-free. The sender does not
>> accept liability for any errors or omissions.
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> --
> Pulasthi Supun
> Software Engineer; WSO2 Inc.; http://wso2.com,
> Email: [email protected]
> Mobile: +94 (71) 9258281
> Blog : http://pulasthisupun.blogspot.com/
> Git hub profile: https://github.com/pulasthi
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Afkham Azeez*
Director of Architecture; WSO2, Inc.; http://wso2.com
Member; Apache Software Foundation; http://www.apache.org/
* <http://www.apache.org/>**
email: **[email protected]* <[email protected]>* cell: +94 77 3320919
blog: **http://blog.afkham.org* <http://blog.afkham.org>*
twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
*
linked-in: **http://lk.linkedin.com/in/afkhamazeez*
*
*
*Lean . Enterprise . Middleware*

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Cluster message replay

Reply via email to