What could be done for level 3: a) at startup the behavior is as is today, cluster-ready, writing repository-heartbeats as configured b) this is done for a configured amount of time at least, eg for 5 minutes (exploring phase) - the idea of this being to avoid any race-conditions of two nodes starting simultaneously c) if after this time, the node realizes, that it is alone (and no-one joined or left during this time), it assumes that it is indeed in a standalone setup and stops sending heartbeats (solitude phase) d) if another node starts up in the same cluster, it would as normal start doing these heartbeats for a few minutes (exploring phase) - giving the original node time to wake up to the idea that it was never alone (alien phase) - at which point it quickly starts to go back to sending heartbeats and voting and all those things (party phase)
phase d) is obviously slightly tricky .. Cheers, Stefan On 2/7/14 3:00 PM, "Stefan Egli" <[email protected]> wrote: >Hi, > >I like the idea of reducing write-bandwidth used by topology. I'd sum it >into three possible levels though: > > 1) keep the (topology-connector) announcement's lastHeartbeat as a >separate property and only update that (on receiving a >connector-heartbeat) instead of updating the entire announcement-json as >is now. > > 2) we might even be able to not having to store the announcement's >lastHeartbeat when the logic is changed, such that the announcement is >valid as long as the recipient of the announcement (ie the owner) is >alive. This would increase the reaction time on crash of a remote instance >longer though. > > 3) avoid repository (ie cluster-local) heartbeats entirely for the >single-node case (in which case keeping the announcement in memory is >feasible). > >I see level 1 as something we should do, level 2 to be further analyzed >(verify the implications, but I think it's possible). But I have my >reservations re level 3, as this would complicate the 'cluster first' >goal: we'd have to detect situations where a single-node is 'suddenly' >accompanied by another node to form a cluster, as this would have to be >detected by discovery.impl. And I fear that this might in the end-effect >again result in some sort of heartbeat (maybe for a limited time after >startup only though). Question is, whether it's a "problem" to have >cluster-heartbeats stored every say 30 sec and whether that justifies >complicating the algorithm for this case. > >Cheers, >Stefan > >On 2/7/14 2:44 PM, "Jörg Hoh" <[email protected]> wrote: > >>Hi, >> >>I am thinking if we reduce the amount of data persisted in the repository >>with every topology heartbeat. >> >>For example we could just update the timestamp of the of announcement >>hearbeat, if the topology hasn't changed at all (instead of writing the >>complete announcement). >> >>A more radical approach would be to avoid the persisting of topology >>information to repo completely, if this node isn't part of a cluster at >>all. All the state could be kept in memory, and in case of crash/restart >>the topology needs to gathered again. Of course this would require some >>more logic in case if a single node is being promoted to a member of an >>cluster, as then the current behaviour should be used. >> >>WDYT? >> >>Jörg >> >> >>-- >> >>http://cqdump.wordpress.com >>Twitter: @joerghoh >
