I'm agree this is an issue to improve in the write_graphite plugin, I have the same problem and the best way to solve is by adding an "AutoReconnectTimeout" option to close and reconnect after an amount of time ( like 1 hour or 1 day) handled from inside plugin to avoid lost of data.
I suggest you open an issue as "Feature Request" to the github because of this can be a very useful feature. https://github.com/collectd/collectd/issues In some weeks myself ( if no other have done first) I will try to write a patch to solve this problem. 2014-12-01 23:25 GMT+01:00 Mark Juric <[email protected]>: > Hi all, > > From the write_graphite plugin documentation: "The plugin aims to be very > efficient. It keeps the TCP connection to *Carbon* open in order to > minimize the connection handshake overhead." This efficiency is causing me > a lot of headaches. We're running 2,000 servers through a Netscaler > load-balancer which distributes the traffic to multiple daemons running on > multiple nodes in the Graphite cluster. The problem is, if a node dies, the > load-balancer will (as expected) distribute the traffic to the remaining > nodes in the cluster. However, because the connections are persistent, it > doesn't rebalance them once the dead node comes back on-line. This leaves > the rest of the nodes in an over-worked state, and the revived node almost > completely unused. > > Any thoughts on how or where in the code to best fix this? > > Mark > > > _______________________________________________ > collectd mailing list > [email protected] > http://mailman.verplant.org/listinfo/collectd > > -- Att Toni Moreno 699706656 *Si no quieres perderte en el olvido tan pronto como estés muerto y corrompido, * *escribe cosas dignas de leerse, o haz cosas dignas de escribirse.* *Benjamin Franklin*
_______________________________________________ collectd mailing list [email protected] http://mailman.verplant.org/listinfo/collectd
