Dnia 2010-08-23, pon o godzinie 15:05 +0200, Sebastian Harl pisze: > On Mon, Aug 23, 2010 at 02:50:33PM +0200, XANi wrote: > > Dnia 2010-08-23, pon o godzinie 13:42 +0200, Sebastian Harl pisze: > > > On Mon, Aug 23, 2010 at 01:34:08PM +0200, XANi wrote: > > > > Dnia 2010-08-23, pon o godzinie 13:11 +0200, Sebastian Harl pisze: > > > > > On Mon, Aug 23, 2010 at 04:02:57AM +0200, XANi wrote: > > > > > > So after running something like: > > > > > > while sleep 30 ; do /etc/init.d/collectd restart; done > > > > > > after some time (sometimes few minutes sometimes an hour or more) i > > > > > > get > > > > > > tons of collectd processes lying around (ive added output of ps aux > > > > > > as > > > > > > attachment) and sometimes after restart. > > > > > […] > > > > > > It seems to trigger when both exec and unixsock plugins are on, if i > > > > > > turn off one of them it works fine. Ah and im using 64 bit debian > > > > > > testing. > > > > > > > > > > Uhm, strange. Could you please check (e.g. using "strace -p <pid>") > > > > > what > > > > > those collectd processes are doing? What's the parent of those > > > > > processes > > > > > (PPID in "ps ax -l" or use something like "ps axjf")? Are you able to > > > > > kill those processes using signal SIGINT or SIGTERM? > > > > > > > Ok so: > > > > -- > > > > # ps ax |grep col > > > > 4792 ? SLsl 0:00 /usr/sbin/collectd > > > > -C /etc/collectd/collectd.conf -P /var/run/collectd.pid > > > > 4800 ? S 0:00 /usr/sbin/collectd > > > > -C /etc/collectd/collectd.conf -P /var/run/collectd.pid > > > > -- > > > > as attachment result of strace -t -ff -o /tmp/4792 -p 4792 and > > > > strace -t -ff -o /tmp/4800 -p 4800 > > > > > > > > parent of PID 4800 is 4792 > > > > 4792 reacts on sigterm, 4800 both SIGTERM and SIGQUIT doesn't work, only > > > > SIGKILL > > > > > > > 4800.4800: > > > > 13:25:33 futex(0x7fe9098f7550, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished > > > > ...> > > > > > > Thanks. Looks like some kind of deadlock :-/ I'll look into that. > > > > If u want i can give u access to VM with that bug already "trigerred" > > and root access so u can install debug tools, just send me ur ssh pubkey > > Thanks. I'll have a look at the code first but I might come back to that > offer after that ;-) Not quite sure when I'll have some time for that > though. Possibly some time this week. > > Cheers, > Sebastian >
Ive noticed it's much easier to trigger on VM too (maybe because host is quite busy with other machines), on my desktop it sometimes takes an hour or 2 to trigger, on VM its triggered after few mintutes max. Also i noticed that "locked" process is running as user ive told exec plugin to run script as so Exec postfix "/usr/local/bin/a.pl" results in: template:~# ps aux |grep coll|grep -v grep root 2469 0.0 0.2 162764 1436 ? S<Lsl 15:22 0:00 /usr/sbin/collectd -C /etc/collectd/collectd.conf -P /var/run/collectd.pid postfix 2476 0.0 0.2 101408 1168 ? S< 15:22 0:00 /usr/sbin/collectd -C /etc/collectd/collectd.conf -P /var/run/collectd.pid Hope that helps :) -- Mariusz Gronczewski (XANi) <[email protected]> GnuPG: 0xEA8ACE64 http://devrandom.pl
signature.asc
Description: This is a digitally signed message part
_______________________________________________ collectd mailing list [email protected] http://mailman.verplant.org/listinfo/collectd
