This is similar, but not quite the same behaviour as the bug I've got open, also with write_graphite and TCP behaviour. Mine is that write_graphite doesn't recover if the graphite carbon-cache is restarted. (https://github.com/collectd/collectd/issues/430).
I know this isn't the answer, but switching to UDP is a workaround. I don't like the perceived unreliability though, as I'm quite dependant on the metrics being available. cheers mike -- Michael Hart Arctic Wolf Networks M: 226.388.4773 On 2013-10-17, at 8:49 PM, ryanL <[email protected]<mailto:[email protected]>> wrote: heya. i've compiled 5.4 for linux (centos) at commit 0a161fcfd, and seem to be having a problem that does not exist at 5.1. my collectd is pretty barebones, just doing snmp polling against network devices every 60s. when first starting it up, i get an established TCP connection to my graphite collector and values get written. then, we get stuck. i can see in tcpdump that collectd is polling the network and getting values, but can't write to graphite. i see this: # while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp | grep TCP; done collectd 7996 produser 10u IPv4 36198298 0t0 TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT) collectd 7996 produser 10u IPv4 36198298 0t0 TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT) collectd 7996 produser 10u IPv4 36198298 0t0 TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT) collectd 7996 produser 10u IPv4 36198298 0t0 TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT) collectd 7996 produser 10u IPv4 36198298 0t0 TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT) it stays in this state forever until i restart collectd. upon doing so i'll get one initial blast of collected data, and then we're jammed again. my relevant collectd config: <Plugin write_graphite> <Carbon> Host "graphite-collector" Port "2003" Protocol "tcp" Prefix "collectd." StoreRates false AlwaysAppendDS false Postfix "" EscapeCharacter "_" </Carbon> </Plugin> on the collectd 5.1 box, it remains like this: $ while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp | grep TCP; done collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) collectd 5638 root 9u IPv4 561435650 0t0 TCP 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED) any ideas, or further info i can give you guys? thanks! ryan _______________________________________________ collectd mailing list [email protected]<mailto:[email protected]> http://mailman.verplant.org/listinfo/collectd
_______________________________________________ collectd mailing list [email protected] http://mailman.verplant.org/listinfo/collectd
