We're seeing some strange behavior with the exec plugin. It works great for a short period of time (usually a few hours) and then stops reporting. I've also confirmed that the configured scripts are not being exec'd once collectd gets into this funny state, so it appears not to be a network/reporting problem, but a problem with the exec plugin itself. All other aspects of collectd work fine while the exec plugin is in this state.
Basic info: $ uname -a Linux fs5b.rs.github.com 2.6.26-2-amd64 #1 SMP Wed Aug 19 22:33:18 UTC 2009 x86_64 GNU/Linux $ collectd --help <snip> collectd 4.8.1, http://collectd.org/ Once I notice the plugin has stopped reporting, I have an extra process (28489) hanging around: $ pstree -apu 22935 collectdmon,22935 -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf collectd,22936 -C /etc/collectd/collectd.conf -f collectd,28489 -C /etc/collectd/collectd.conf -f {collectd},22937 {collectd},22938 {collectd},22939 {collectd},22940 {collectd},22941 {collectd},28487 That process seems to exist only when the exec plugin is no longer reporting. Sometimes there's two of these processes. strace reports that the extra process is sitting in a mutex. It never leaves this state: $ sudo strace -p 28489 Process 28489 attached - interrupt to quit futex(0x7f2f7d4e8fb0, FUTEX_WAIT_PRIVATE, 2, NULL We currently have two different exec plugins configured on this machine. Both are short-lived (i.e. don't sleep loop on INTERVAL): <Plugin exec> Exec "nobody" "/etc/collectd/exec/haproxy-fs.sh" Exec "nobody" "/etc/collectd/exec/ernie-fs.sh" </Plugin> Any ideas what might be going on here or information I could provide to help find a root cause? Thanks, Ryan _______________________________________________ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd