(Apologies for the M$ formatting--haven't figured out how to make the 
company-provided email client quote/reply properly.)

Thanks for the info that the dlopen'ed shared object is not closed but remains 
open when fork() is called.  That and the #4578 glibc bug report do resonate 
with this situation.  Pressing '1' while running 'top' on the test machine 
shows 32 logical cpus, numbered 0 through 31, inclusive, so SMP-related 
concurrency issues definitely appear possible.  According to 'dpkg', it appears 
I'm running glibc 2.15-0ubuntu10.3 on this Ubuntu 12.04.2 system (plus 
updates).  The assertion failure happens around 30-50% of the time I attempt to 
start collectd with the exec plugin enabled.  I can almost always get a failure 
within about 2-7 attempts.

Is there a possibility that the dlopen'ed shared object could be finalized or 
tidied up before the fork()?  I would think there must be lots of other 
programs out there that dlopen() a shared object and then later call fork() and 
exec...().

Thanks,

Robert Riches

-----Original Message-----
From: Florian Forster [mailto:[email protected]] 
Sent: Thursday, February 21, 2013 2:04 AM
To: Riches Jr, Robert M
Cc: [email protected]
Subject: Re: [collectd] randomly getting dl_open_worker assertion

Hi Robert,

On Wed, Feb 20, 2013 at 04:21:16PM +0000, Riches Jr, Robert M wrote:
> [2013-02-20 08:01:31] exec plugin: exec_read_one: error = Inconsistency 
> detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion 
> `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!

this appears to be an assertion within glibc's implementation of dlopen(3). [0] 
It looks like this bug from 2007 could be related: [1]

> There doesn't seem to be any rhyme or reason as to whether I get the 
> expected result or the assertion failure.  I've googled for answers 
> until my keyboard is wearing out, but nothing has come up that shows 
> promise of a solution.

From what you describe, it feels like a concurrency issue. collectd is using 
dlopen() to load the plugins, including the exec plugin. This happens at 
start-up only; later the mechanism is no longer used, but the dlopen'ed shared 
object are never closed, so they are still open when
fork() is called.

> Regarding the behavior when I run the real script that doesn't send 
> anything to stderr, […]

I don't think this is related to I/O. It sounds more like a problem between 
dlopen() and fork().

How many processors does the machine have on which this problem occurs?
Which libc are you using? Approximately, how often does this happen?

Best regards,
—octo

[0] <http://code.woboq.org/userspace/glibc/elf/dl-open.c.html#259>
[1] <http://www.sourceware.org/bugzilla/show_bug.cgi?id=4578>
--
collectd – The system statistics collection daemon
Website: http://collectd.org
Google+: http://collectd.org/+
GitHub:  https://github.com/collectd
Twitter: http://twitter.com/collectd
_______________________________________________
collectd mailing list
[email protected]
http://mailman.verplant.org/listinfo/collectd

Reply via email to