Re: Sling Topology heartbeat: reduce the amount of repo write activity

Stefan Egli Fri, 07 Feb 2014 06:03:26 -0800

Hi,

I like the idea of reducing write-bandwidth used by topology. I'd sum it
into three possible levels though:

 1) keep the (topology-connector) announcement's lastHeartbeat as a
separate property and only update that (on receiving a
connector-heartbeat) instead of updating the entire announcement-json as
is now.

 2) we might even be able to not having to store the announcement's
lastHeartbeat when the logic is changed, such that the announcement is
valid as long as the recipient of the announcement (ie the owner) is
alive. This would increase the reaction time on crash of a remote instance
longer though.

 3) avoid repository (ie cluster-local) heartbeats entirely for the
single-node case (in which case keeping the announcement in memory is
feasible).

I see level 1 as something we should do, level 2 to be further analyzed
(verify the implications, but I think it's possible). But I have my
reservations re level 3, as this would complicate the 'cluster first'
goal: we'd have to detect situations where a single-node is 'suddenly'
accompanied by another node to form a cluster, as this would have to be
detected by discovery.impl. And I fear that this might in the end-effect
again result in some sort of heartbeat (maybe for a limited time after
startup only though). Question is, whether it's a "problem" to have
cluster-heartbeats stored every say 30 sec and whether that justifies
complicating the algorithm for this case.

Cheers,
Stefan

On 2/7/14 2:44 PM, "Jörg Hoh" <[email protected]> wrote:

>Hi,
>
>I am thinking if we reduce the amount of data persisted in the repository
>with every topology heartbeat.
>
>For example we could just update the timestamp of the of announcement
>hearbeat, if the topology hasn't changed at all (instead of writing the
>complete announcement).
>
>A more radical approach would be to avoid the persisting of topology
>information to repo completely, if this node isn't part of a cluster at
>all. All the state could be kept in memory, and in case of crash/restart
>the topology needs to gathered again. Of course this would require some
>more logic in case if a single node is being promoted to a member of an
>cluster, as then the current behaviour should be used.
>
>WDYT?
>
>Jörg
>
>
>-- 
>
>http://cqdump.wordpress.com
>Twitter: @joerghoh

Re: Sling Topology heartbeat: reduce the amount of repo write activity

Reply via email to