On Thu, 2010-02-25 at 08:15 -0300, Leonardo Chiquitto wrote: > On Thu, Feb 25, 2010 at 11:15:31AM +0800, Ian Kent wrote: > > On 02/23/2010 03:48 AM, Leonardo Chiquitto wrote: > > > Hello, > > > > > > We have a user reporting periodic crashes in automount. The daemon gets > > > killed by SIGBUS when returning from spawn_mount(): > > > > > > Core was generated by `/usr/sbin/automount -p /var/run/automount.pid'. > > > Program terminated with signal 7, Bus error. > > > #0 0x0000555555566bd0 in spawn_mount (logopt=Cannot access memory at > > > address 0x80004062242c > > > ) at spawn.c:412 > > > 412 } > > > > > > 0x0000555555566bcd <spawn_mount+829>: mov %r12d,%eax > > > 0x0000555555566bd0 <spawn_mount+832>: pop %rbx > > > 0x0000555555566bd1 <spawn_mount+833>: pop %r12 > > > 0x0000555555566bd3 <spawn_mount+835>: pop %r13 > > > 0x0000555555566bd5 <spawn_mount+837>: pop %r14 > > > 0x0000555555566bd7 <spawn_mount+839>: pop %r15 > > > 0x0000555555566bd9 <spawn_mount+841>: leaveq > > > 0x0000555555566bda <spawn_mount+842>: retq > > > > > > Is it possible that we're exceeding stack usage at this point, mostly > > > due to the call to alloca()? Do you think we should replace alloca() with > > > regular malloc() in spawn.c (patch below)? > > > > Does this patch actually resolve your customers' problem? > > Unfortunately I still don't know. Customer is currently running with > a workaround (increased stack limit from 8k to 32k) to avoid the > problem. I did some basic tests with the patch here but decided > to wait for your comments before submitting a test package. > > > What is the version in use and what additional patches have been applied? > > They are running 5.0.3 plus all the patches in patch_order-5.0.3 and > autofs-5.0.4-fix_negative_cache_non-existent_key.patch, meaning that we > don't have the other alloca() replacements that went in after 5.0.4.
OK. I had a bug report where the customer believed that the max open file limit and stack size was a problem. It turned out that increasing them, for some unknown reason reduced the likelihood of the problem occurring, but actually had nothing to to with the problem. If automount crashes then you need to look at the gdb backtrace of the running threads at the time of the crash with "thr a a bt" to get more info. I don't know how you provide debug symbols for your packages but you will need them if you want to make any sens at all of the backtrace. Is your customer using direct mounts? Is your customer using LDAP? Have a look at the patches below and try and work out if they are relevant to the code base you are working with: autofs-5.0.4-fix-direct-map-cache-locking.patch autofs-5.0.4-fix-dont-umount-existing-direct-mount-on-reread.patch (of course this path accompanies autofs-5.0.4-dont-umount-existing-direct-mount-on-reread.patch) autofs-5.0.4-fix-libxml2-non-thread-safe-calls.patch There are also some other libxml2 patches, which took several tries to get right, whose symptom is apparent random crashes: autofs-5.0.4-fix-dumb-libxml2-check.patch autofs-5.0.4-libxml2-workaround-fix.patch autofs-5.0.4-library-reload-fix-update-fix-2.patch autofs-5.0.4-library-reload-fix-update.patch autofs-5.0.4-library-reload-fix-update-fix.patch Not sure about the order of these and what their dependencies are. I think all the patches have reasonably good descriptions. Of course most of this stuff isn't relevant if LDAP isn't being used. Ian _______________________________________________ autofs mailing list [email protected] http://linux.kernel.org/mailman/listinfo/autofs
