If 1) and 2) are easily doable, I would start with them and see where it leads us
Regards Carsten 2014-02-07 16:04 GMT+01:00 Stefan Egli <[email protected]>: > I think this is basically what I proposed - except that I would be careful > to handle race-conditions with > startups/cluster-delays/observation-threading and thus propose to have a > period of a few minutes in which it assumes to be in a cluster, and only > afterwards switch back if that's not the case. > > Cheers, > Stefan > > On 2/7/14 3:58 PM, "Jörg Hoh" <[email protected]> wrote: > > >Hi, > > > >some ideas regarding the cluster detection: > > > >* When a cluster node comes up, it writes it's intial "I am here" > >annoucement to the repo. > >* The node then comes to the listing mode, in which he listens if there's > >any "pong" or if a new ping comes in, but here it doesn't write any > >cluster > >heartbeat information. > >* If the node receives a "ping" or a "pong", it knows, that it is running > >indeed in a cluster (either a new partner joined or the node itself joined > >a cluster) and then starts up the regular cluster heartbeat. > > > >In such a case you wouldn't need to handle the cluster case differently > >from a single node mode, you don't even have to have a timeout. > > > >Jörg > > > > > > > > > >2014-02-07 Stefan Egli <[email protected]>: > > > >> What could be done for level 3: > >> > >> a) at startup the behavior is as is today, cluster-ready, writing > >> repository-heartbeats as configured > >> b) this is done for a configured amount of time at least, eg for 5 > >> minutes (exploring phase) - the idea of this being to avoid any > >> race-conditions of two nodes starting simultaneously > >> c) if after this time, the node realizes, that it is alone (and no-one > >> joined or left during this time), it assumes that it is indeed in a > >> standalone setup and stops sending heartbeats (solitude phase) > >> d) if another node starts up in the same cluster, it would as normal > >> start doing these heartbeats for a few minutes (exploring phase) - > >>giving > >> the original node time to wake up to the idea that it was never alone > >> (alien phase) - at which point it quickly starts to go back to sending > >> heartbeats and voting and all those things (party phase) > >> > >> phase d) is obviously slightly tricky .. > >> > >> Cheers, > >> Stefan > >> > >> On 2/7/14 3:00 PM, "Stefan Egli" <[email protected]> wrote: > >> > >> >Hi, > >> > > >> >I like the idea of reducing write-bandwidth used by topology. I'd sum > >>it > >> >into three possible levels though: > >> > > >> > 1) keep the (topology-connector) announcement's lastHeartbeat as a > >> >separate property and only update that (on receiving a > >> >connector-heartbeat) instead of updating the entire announcement-json > >>as > >> >is now. > >> > > >> > 2) we might even be able to not having to store the announcement's > >> >lastHeartbeat when the logic is changed, such that the announcement is > >> >valid as long as the recipient of the announcement (ie the owner) is > >> >alive. This would increase the reaction time on crash of a remote > >>instance > >> >longer though. > >> > > >> > 3) avoid repository (ie cluster-local) heartbeats entirely for the > >> >single-node case (in which case keeping the announcement in memory is > >> >feasible). > >> > > >> >I see level 1 as something we should do, level 2 to be further analyzed > >> >(verify the implications, but I think it's possible). But I have my > >> >reservations re level 3, as this would complicate the 'cluster first' > >> >goal: we'd have to detect situations where a single-node is 'suddenly' > >> >accompanied by another node to form a cluster, as this would have to be > >> >detected by discovery.impl. And I fear that this might in the > >>end-effect > >> >again result in some sort of heartbeat (maybe for a limited time after > >> >startup only though). Question is, whether it's a "problem" to have > >> >cluster-heartbeats stored every say 30 sec and whether that justifies > >> >complicating the algorithm for this case. > >> > > >> >Cheers, > >> >Stefan > >> > > >> >On 2/7/14 2:44 PM, "Jörg Hoh" <[email protected]> wrote: > >> > > >> >>Hi, > >> >> > >> >>I am thinking if we reduce the amount of data persisted in the > >>repository > >> >>with every topology heartbeat. > >> >> > >> >>For example we could just update the timestamp of the of announcement > >> >>hearbeat, if the topology hasn't changed at all (instead of writing > >>the > >> >>complete announcement). > >> >> > >> >>A more radical approach would be to avoid the persisting of topology > >> >>information to repo completely, if this node isn't part of a cluster > >>at > >> >>all. All the state could be kept in memory, and in case of > >>crash/restart > >> >>the topology needs to gathered again. Of course this would require > >>some > >> >>more logic in case if a single node is being promoted to a member of > >>an > >> >>cluster, as then the current behaviour should be used. > >> >> > >> >>WDYT? > >> >> > >> >>Jörg > >> >> > >> >> > >> >>-- > >> >> > >> >>http://cqdump.wordpress.com > >> >>Twitter: @joerghoh > >> > > >> > >> > > > > > >-- > >Cheers, > >Jörg Hoh, > > > >http://cqdump.wordpress.com > >Twitter: @joerghoh > > -- Carsten Ziegeler [email protected]
