On Mon, 2008-06-30 at 10:08 +0200, Carsten Aulbert wrote:
> Hi again,
> 
> last weekend went quite bad for our nodes and autofs seem to die quite often. 
> Caveat: We had some networking problems during the weekend as well so this 
> might
> be triggered by failing network connections:

I have been made aware of this bug recently.

> 
> Jun 28 01:05:17 n1035 mountd[32688]: authenticated mount request from 
> 10.10.11.47:603 for /local (/local)
> Jun 28 01:05:24 n1035 mountd[32665]: authenticated mount request from 
> 10.10.10.33:644 for /local (/local)
> Jun 28 01:05:32 n1035 kernel: [1956924.510009] Unable to handle kernel NULL 
> pointer dereference at 0000000000000028 RIP:
> Jun 28 01:05:32 n1035 kernel: [1956924.510027]  [<ffffffff802710c4>] 
> fput+0x0/0x11
> Jun 28 01:05:32 n1035 kernel: [1956924.510466] PGD d0a52067 PUD d0841067 PMD 0
> Jun 28 01:05:32 n1035 kernel: [1956924.510621] Oops: 0002 [1] SMP
> Jun 28 01:05:32 n1035 kernel: [1956924.510773] CPU 3
> Jun 28 01:05:32 n1035 kernel: [1956924.510919] Modules linked in: nfs nfsd 
> lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ipmi_si ipmi_devintf i
> pmi_watchdog ipmi_poweroff ipmi_msghandler i2c_i801 8250_pnp 8250 i2c_core 
> e1000 serial_core
> Jun 28 01:05:32 n1035 kernel: [1956924.511407] Pid: 25000, comm: get_file Not 
> tainted 2.6.24.4-nodes #1
> Jun 28 01:05:32 n1035 kernel: [1956924.511567] RIP: 0010:[<ffffffff802710c4>] 
>  [<ffffffff802710c4>] fput+0x0/0x11
> Jun 28 01:05:32 n1035 kernel: [1956924.511862] RSP: 0018:ffff8100e792b950  
> EFLAGS: 00010246
> Jun 28 01:05:32 n1035 kernel: [1956924.512014] RAX: 0000000000000001 RBX: 
> ffff81020f26d180 RCX: ffff810216941408
> Jun 28 01:05:32 n1035 kernel: [1956924.512307] RDX: fffffffffffffe00 RSI: 
> 0000000000000206 RDI: 0000000000000000
> Jun 28 01:05:32 n1035 kernel: [1956924.512608] RBP: 0000000000000000 R08: 
> ffff8100e792a000 R09: 0000000000000000
> Jun 28 01:05:32 n1035 kernel: [1956924.512902] R10: 0000000000000000 R11: 
> ffff810216ae2e80 R12: ffff81020b2a8a80
> Jun 28 01:05:32 n1035 kernel: [1956924.513197] R13: ffff8100e792b9e8 R14: 
> ffff810000000000 R15: 0000000000000000
> Jun 28 01:05:32 n1035 kernel: [1956924.513492] FS:  0000000040401960(0063) 
> GS:ffff810217c771c0(0000) knlGS:0000000000000000
> Jun 28 01:05:32 n1035 kernel: [1956924.513791] CS:  0010 DS: 0000 ES: 0000 
> CR0: 000000008005003b
> Jun 28 01:05:32 n1035 kernel: [1956924.513945] CR2: 0000000000000028 CR3: 
> 00000000e4c85000 CR4: 00000000000006e0
> Jun 28 01:05:32 n1035 kernel: [1956924.514239] DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Jun 28 01:05:32 n1035 kernel: [1956924.514535] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Jun 28 01:05:32 n1035 kernel: [1956924.514831] Process get_file (pid: 25000, 
> threadinfo ffff8100e792a000, task ffff8100d3005850)
> Jun 28 01:05:32 n1035 kernel: [1956924.515128] Stack:  ffffffff8806dc26 
> ffff81020f26d180 ffff81012d1db5c0 0000000000000110
> Jun 28 01:05:32 n1035 kernel: [1956924.515433]  ffffffff8806e222 
> 000000011781cb40 ffff81020b2a8a80 ffff81021781cb40
> Jun 28 01:05:32 n1035 kernel: [1956924.515734]  ffff810217c2d680 
> ffff8100e792ba18 0000000000000000 0000000000000000
> Jun 28 01:05:32 n1035 kernel: [1956924.515895] Call Trace:
> Jun 28 01:05:32 n1035 kernel: [1956924.516186]  [<ffffffff8806dc26>] 
> :autofs4:autofs4_catatonic_mode+0x5e/0x6c
> Jun 28 01:05:32 n1035 kernel: [1956924.516346]  [<ffffffff8806e222>] 
> :autofs4:autofs4_wait+0x5a2/0x76d
> Jun 28 01:05:32 n1035 kernel: [1956924.516509]  [<ffffffff8027ec71>] 
> dput+0x1c/0x10b
> Jun 28 01:05:32 n1035 kernel: [1956924.516667]  [<ffffffff8806ccfd>] 
> :autofs4:try_to_fill_dentry+0x6d/0xf1
> Jun 28 01:05:32 n1035 kernel: [1956924.516826]  [<ffffffff8806cf7b>] 
> :autofs4:autofs4_revalidate+0xa6/0x17e
> Jun 28 01:05:32 n1035 kernel: [1956924.516983]  [<ffffffff8031ef47>] 
> xfs_vn_permission+0x0/0x17
> Jun 28 01:05:32 n1035 kernel: [1956924.517139]  [<ffffffff8806d35f>] 
> :autofs4:autofs4_lookup+0x289/0x33e
> Jun 28 01:05:32 n1035 kernel: [1956924.517297]  [<ffffffff80276cc6>] 
> do_lookup+0xc2/0x1b8
> Jun 28 01:05:32 n1035 kernel: [1956924.517452]  [<ffffffff8027725e>] 
> __link_path_walk+0x365/0xd0a
> Jun 28 01:05:32 n1035 kernel: [1956924.517627]  [<ffffffff8811abc6>] 
> :nfs:nfs_writepages_callback+0x0/0x1e
> Jun 28 01:05:32 n1035 kernel: [1956924.517785]  [<ffffffff80277c52>] 
> link_path_walk+0x4f/0xcb
> Jun 28 01:05:32 n1035 kernel: [1956924.517941]  [<ffffffff8026f7cc>] 
> get_unused_fd_flags+0x77/0x112
> Jun 28 01:05:32 n1035 kernel: [1956924.518097]  [<ffffffff80277e6f>] 
> do_path_lookup+0x1a1/0x1c5
> Jun 28 01:05:32 n1035 kernel: [1956924.518252]  [<ffffffff80277fa8>] 
> __path_lookup_intent_open+0x50/0x8d
> Jun 28 01:05:32 n1035 kernel: [1956924.518409]  [<ffffffff802787e2>] 
> open_namei+0x86/0x606
> Jun 28 01:05:32 n1035 kernel: [1956924.518568]  [<ffffffff8026f652>] 
> do_filp_open+0x1c/0x3d
> Jun 28 01:05:32 n1035 kernel: [1956924.518723]  [<ffffffff8026f7cc>] 
> get_unused_fd_flags+0x77/0x112
> Jun 28 01:05:32 n1035 kernel: [1956924.518880]  [<ffffffff8026f979>] 
> do_sys_open+0x46/0xca
> Jun 28 01:05:32 n1035 kernel: [1956924.519034]  [<ffffffff8020be0e>] 
> system_call+0x7e/0x83
> Jun 28 01:05:32 n1035 kernel: [1956924.519187]
> Jun 28 01:05:32 n1035 kernel: [1956924.519331]
> Jun 28 01:05:32 n1035 kernel: [1956924.519332] Code: f0 ff 4f 28 0f 94 c0 84 
> c0 74 05 e9 92 fe ff ff c3 65 48 8b
> Jun 28 01:05:32 n1035 kernel: [1956924.519813] RIP  [<ffffffff802710c4>] 
> fput+0x0/0x11
> Jun 28 01:05:32 n1035 kernel: [1956924.519967]  RSP <ffff8100e792b950>
> Jun 28 01:05:32 n1035 kernel: [1956924.520115] CR2: 0000000000000028
> Jun 28 01:05:32 n1035 kernel: [1956924.521139] ---[ end trace 
> 436b46a0e9d6c8a6 ]---
> 
> We get several of these after one antoher. A little bit later we get:

I'm not sure we really want to try go further with the stuff below until
the breakage in the kernel is fixed.

> 
> Jun 28 01:05:32 n1035 rpc.idmapd[2133]: nfsopen: 
> open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files
> Jun 28 01:05:32 n1035 last message repeated 2 times
> Jun 28 01:05:35 n1035 mountd[32665]: authenticated unmount request from 
> 10.10.8.70:867 for /local (/local)
> Jun 28 01:05:50 n1035 mountd[32627]: authenticated unmount request from 
> 10.10.13.37:600 for /local (/local)
> Jun 28 01:05:52 n1035 rpc.idmapd[2133]: nfsopen: 
> open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files
> Jun 28 01:05:52 n1035 last message repeated 2 times
> Jun 28 01:05:52 n1035 automount[18299]: AUTOFS_IOC_READY: Invalid argument
> Jun 28 01:05:52 n1035 last message repeated 5 times
> Jun 28 01:05:52 n1035 rpc.idmapd[2133]: nfsopen: 
> open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files
> Jun 28 01:05:53 n1035 last message repeated 10 times
> Jun 28 01:05:53 n1035 automount[31737]: >> mount: n1037:/local failed, reason 
> given by server: Permission denied
> Jun 28 01:05:53 n1035 automount[31756]: >> mount: n0197:/local failed, reason 
> given by server: Permission denied
> Jun 28 01:05:53 n1035 automount[31743]: >> nfs bindresvport: Address already 
> in use
> Jun 28 01:05:53 n1035 automount[31766]: >> mount: n0365:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31754]: >> mount: n0717:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31778]: >> mount: n0986:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31739]: >> mount: n1315:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31774]: >> mount: n0919:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31781]: >> mount: n0006:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31741]: >> mount: n0592:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31753]: >> mount: n1080:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31757]: >> mount: n0168:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[18299]: AUTOFS_IOC_READY: Invalid argument
> Jun 28 01:05:53 n1035 automount[31787]: >> mount: n0677:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31785]: >> mount: n0806:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31760]: >> mount: n0868:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31756]: mount(nfs): nfs: mount failure 
> n0197:/local on /atlas/node/n0197
> Jun 28 01:05:53 n1035 automount[31737]: mount(nfs): nfs: mount failure 
> n1037:/local on /atlas/node/n1037
> Jun 28 01:05:53 n1035 automount[31745]: >> mount: n1105:/local: can't read 
> superblock
> Jun 28 01:05:53 n1035 automount[31743]: mount(nfs): nfs: mount failure 
> n0937:/local on /atlas/node/n0937
> Jun 28 01:05:53 n1035 rpc.idmapd[2133]: nfsopen: 
> open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files
> Jun 28 01:05:53 n1035 automount[31753]: mount(nfs): nfs: mount failure 
> n1080:/local on /atlas/node/n1080
> 
> 
> I guess since the mounts are not umounted anymore that should explain why we 
> run out of ports.
> 
> Any suggestions? Do you know this error?
> 
> Again these boxes are all Debian Etch with 2.6.24.4 kernel (vanilla). and 
> autof  4.1.4-13 on amd64.

I can produce patches for 2.6.24.4 if you would like to test them.
They are quite recent and haven't been posted for inclusion in any
kernel yet. I would also include some other patches, that I hope will
get into mainline, in the series.

Ian


_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to