> > > What is the version in use and what additional patches have been applied?
> >
> > They are running 5.0.3 plus all the patches in patch_order-5.0.3 and
> > autofs-5.0.4-fix_negative_cache_non-existent_key.patch, meaning that we
> > don't have the other alloca() replacements that went in after 5.0.4.
>
> OK.
>
> I had a bug report where the customer believed that the max open file
> limit and stack size was a problem. It turned out that increasing them,
> for some unknown reason reduced the likelihood of the problem occurring,
> but actually had nothing to to with the problem.
Increasing the stack size definitelly helped here too. Customer is not
seeing the problem anymore and now that we have a workaround, it's
more complicated to keep asking for more tests. I spent a lot of time
trying to reproduce the problem in house to make testing easier, but
even with a very similar setup (LDAP plus thousands of mount points)
I was not able to make it crash.
> If automount crashes then you need to look at the gdb backtrace of the
> running threads at the time of the crash with "thr a a bt" to get more
> info. I don't know how you provide debug symbols for your packages but
> you will need them if you want to make any sens at all of the backtrace.
All threads look allright, except for thread 1 that apparently has a
corrupted stack (and hence caused the SIGBUS):
(gdb) thr a a bt
Thread 7 (Thread 3577):
#0 0x00002b39dd901a48 in do_sigwait () from /lib64/libpthread.so.0
#1 0x00002b39dd901aed in sigwait () from /lib64/libpthread.so.0
#2 0x000055555555d6aa in statemachine (arg=<value optimized out>)
at automount.c:1382
#3 main (arg=<value optimized out>) at automount.c:2105
Thread 6 (Thread 3578):
#0 0x00002b39dd8fe517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x0000555555571802 in alarm_handler (arg=<value optimized out>)
at alarm.c:203
#2 0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#3 0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6
Thread 5 (Thread 3579):
#0 0x00002b39dd8fe517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x000055555556b72d in st_queue_handler (arg=<value optimized out>)
at state.c:1022
#2 0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#3 0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6
Thread 4 (Thread 3582):
#0 0x00002b39ddbc9b26 in poll () from /lib64/libc.so.6
#1 0x000055555555f2f7 in get_pkt (pkt=<value optimized out>,
ap=<value optimized out>) at automount.c:925
#2 handle_packet (pkt=<value optimized out>, ap=<value optimized out>)
at automount.c:1082
#3 handle_mounts (pkt=<value optimized out>, ap=<value optimized out>)
at automount.c:1581
#4 0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#5 0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6
Thread 3 (Thread 3585):
#0 0x00002b39ddbc9b26 in poll () from /lib64/libc.so.6
#1 0x000055555555f2f7 in get_pkt (pkt=<value optimized out>,
ap=<value optimized out>) at automount.c:925
#2 handle_packet (pkt=<value optimized out>, ap=<value optimized out>)
at automount.c:1082
#3 handle_mounts (pkt=<value optimized out>, ap=<value optimized out>)
at automount.c:1581
#4 0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#5 0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6
Thread 2 (Thread 3586):
#0 0x00002b39ddbc9b26 in poll () from /lib64/libc.so.6
#1 0x000055555555f2f7 in get_pkt (pkt=<value optimized out>,
ap=<value optimized out>) at automount.c:925
#2 handle_packet (pkt=<value optimized out>, ap=<value optimized out>)
at automount.c:1082
#3 handle_mounts (pkt=<value optimized out>, ap=<value optimized out>)
at automount.c:1581
#4 0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#5 0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6
Thread 1 (Thread 11657):
Cannot access memory at address 0x800040623598
(gdb) thr 1
[Switching to thread 1 (Thread 11657)]#0 0x0000555555566bd0 in
spawn_mount (
logopt=Cannot access memory at address 0x80004062242c
) at spawn.c:412
412 }
(gdb) info registers
rax 0x0 0
rbx 0x406223f0 1080173552
rcx 0x1 1
rdx 0x0 0
rsi 0x0 0
rdi 0x1 1
rbp 0x800040623590 0x800040623590
rsp 0x800040623568 0x800040623568
r8 0x1 1
r9 0x2d89 11657
r10 0x8 8
r11 0x246 582
r12 0x0 0
r13 0x0 0
r14 0x2 2
r15 0x406223b0 1080173488
rip 0x555555566bd0 0x555555566bd0 <spawn_mount+832>
eflags 0x10287 [ CF PF SF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x63 99
gs 0x0 0
fctrl 0x37f 895
fstat 0x0 0
ftag 0xffff 65535
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
mxcsr 0x1f80 [ IM DM ZM OM UM PM ]
> Is your customer using direct mounts?
Yes, lots of direct mounts (more than 9000).
> Is your customer using LDAP?
Yes, all maps are retrieved from LDAP.
> Have a look at the patches below and try and work out if they are
> relevant to the code base you are working with:
Thanks a lot for the useful comments and for listing the patches. I'll
try to merge them in our package.
Kind regards,
Leonardo
_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs