Hi,

some ideas regarding the cluster detection:

* When a cluster node comes up, it writes it's intial "I am here"
annoucement to the repo.
* The node then comes to the listing mode, in which he listens if there's
any "pong" or if a new ping comes in, but here it doesn't write any cluster
heartbeat information.
* If the node receives a "ping" or a "pong", it knows, that it is running
indeed in a cluster (either a new partner joined or the node itself joined
a cluster) and then starts up the regular cluster heartbeat.

In such a case you wouldn't need to handle the cluster case differently
from a single node mode, you don't even have to have a timeout.

Jörg




2014-02-07 Stefan Egli <e...@adobe.com>:

> What could be done for level 3:
>
>  a) at startup the behavior is as is today, cluster-ready, writing
> repository-heartbeats as configured
>  b) this is done for a configured amount of time at least, eg for 5
> minutes (exploring phase) - the idea of this being to avoid any
> race-conditions of two nodes starting simultaneously
>  c) if after this time, the node realizes, that it is alone (and no-one
> joined or left during this time), it assumes that it is indeed in a
> standalone setup and stops sending heartbeats (solitude phase)
>  d) if another node starts up in the same cluster, it would as normal
> start doing these heartbeats for a few minutes (exploring phase) - giving
> the original node time to wake up to the idea that it was never alone
> (alien phase) - at which point it quickly starts to go back to sending
> heartbeats and voting and all those things (party phase)
>
> phase d) is obviously slightly tricky ..
>
> Cheers,
> Stefan
>
> On 2/7/14 3:00 PM, "Stefan Egli" <e...@adobe.com> wrote:
>
> >Hi,
> >
> >I like the idea of reducing write-bandwidth used by topology. I'd sum it
> >into three possible levels though:
> >
> > 1) keep the (topology-connector) announcement's lastHeartbeat as a
> >separate property and only update that (on receiving a
> >connector-heartbeat) instead of updating the entire announcement-json as
> >is now.
> >
> > 2) we might even be able to not having to store the announcement's
> >lastHeartbeat when the logic is changed, such that the announcement is
> >valid as long as the recipient of the announcement (ie the owner) is
> >alive. This would increase the reaction time on crash of a remote instance
> >longer though.
> >
> > 3) avoid repository (ie cluster-local) heartbeats entirely for the
> >single-node case (in which case keeping the announcement in memory is
> >feasible).
> >
> >I see level 1 as something we should do, level 2 to be further analyzed
> >(verify the implications, but I think it's possible). But I have my
> >reservations re level 3, as this would complicate the 'cluster first'
> >goal: we'd have to detect situations where a single-node is 'suddenly'
> >accompanied by another node to form a cluster, as this would have to be
> >detected by discovery.impl. And I fear that this might in the end-effect
> >again result in some sort of heartbeat (maybe for a limited time after
> >startup only though). Question is, whether it's a "problem" to have
> >cluster-heartbeats stored every say 30 sec and whether that justifies
> >complicating the algorithm for this case.
> >
> >Cheers,
> >Stefan
> >
> >On 2/7/14 2:44 PM, "Jörg Hoh" <jhoh...@googlemail.com> wrote:
> >
> >>Hi,
> >>
> >>I am thinking if we reduce the amount of data persisted in the repository
> >>with every topology heartbeat.
> >>
> >>For example we could just update the timestamp of the of announcement
> >>hearbeat, if the topology hasn't changed at all (instead of writing the
> >>complete announcement).
> >>
> >>A more radical approach would be to avoid the persisting of topology
> >>information to repo completely, if this node isn't part of a cluster at
> >>all. All the state could be kept in memory, and in case of crash/restart
> >>the topology needs to gathered again. Of course this would require some
> >>more logic in case if a single node is being promoted to a member of an
> >>cluster, as then the current behaviour should be used.
> >>
> >>WDYT?
> >>
> >>Jörg
> >>
> >>
> >>--
> >>
> >>http://cqdump.wordpress.com
> >>Twitter: @joerghoh
> >
>
>


-- 
Cheers,
Jörg Hoh,

http://cqdump.wordpress.com
Twitter: @joerghoh

Reply via email to