On Thu, 2007-12-20 at 18:30 -0800, Mike Marion wrote:
> In the last 2 days we're seeing our autofs 5.0.2 daemon dumping core,
> and it seems to be triggerd by a kill -HUP call to it to make it re-read
> the maps.  Using all LDAP maps (and if HUP isn't needed there, we can
> turn it off) and it only seems to trigger if the daemon has been running
> for a least a few hours, as I can send it numerous HUP signals after
> restarting it and it won't crash.

When it rains it pours.
Second SEGV report today.

> 
> It looks like the HUP is making it try to shut down a subset of the
> paths (and I see this in syslog sometimes without segfaulting too)..
> where it does several entries of:
> 
>  automount[2475]: umounted direct mount <path>
> followed by the same paths in the same order:
>  automount[2475]: rmdir_path: lstat of <path> failed
> and then it core dumps:
> automount[7419]: segfault at 00002aaaac141e08 rip 0000000000410d63 rsp
> 0000000040627030 error 4

There was a bug that caused the direct map to be pruned out of existence
when a server connection failed for some reason. I don't remember seeing
a SEGV although I wasn't paying attention to that when I worked on it.

> 
> Sometimes that happens after 1 of the above failed rmdir_path lines,
> sometimes after most or all.
> 
> gdb shows them all crashing at the same point:
> #0  lookup_prune_cache (ap=0x54ace0, age=1198202622) at lookup.c:1014
> 
> Unfortunately I don't have the exact same patched copy of lookup.c, or
> at least it didn't line up to a line with anything in it (was blank)
> when I ran the build again and then used the file after rpm patched it.
> 
> This has only cropped up in the last few days.. 
> 
> Running SLES9-SP3 hosts with 2.6.16.21-0.8 kernel from sles10 built on
> it (using src.rpm) with autofs5 patch added.  Autofs-5.0.2 with patches 
> as of June of this year (I believe).

I'm not quite sure what that means but this doesn't sound like a kernel
problem so far.

> 
> First possible thing that comes to mind:
> - Are our maps just too big now?  We have huge maps now, a typical
>   /proc/mounts has values like so:
> $ grep ^auto. /proc/mounts  |wc
>    6940   41640  815531
> 
> Yes.. we have almost 7000 mounts in the maps.  Those are all direct
> mounts.  We have > 25,000 mounts in our homedir map, but that's an
> indirect map.

That shouldn't be a problem except that expires and map reads will take
much longer. If there are problems with synchronization I expect you
will see them before most others.

> 
> If one of the newer patches in the last few months might address this,
> I'll be happy to patch up.  

There are a lot of patches, about 62 now.
I need to consolidate and release 5.0.3 but I'm still testing and now
have a couple more bugs.

I would prefer to work from fully patched source if possible.

Ian


_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to