Re: NFS-related hang in 5.4?

Eirik Øverby Sun, 19 Jun 2005 17:13:02 -0700


On 19. jun. 2005, at 20.06, Robert Watson wrote:

On Sun, 19 Jun 2005, Eirik Øverby wrote:
when doing large file transfers (backing up jails using tar+gzipto a neighboring server), NFS has a tendency to lock up on me.This usually happens after quite a while - like a few hours or so.Also, before the hang, performance is generally bad.
Hmm. Looks like a bug in dummynet. ipfw should not be directly re-injecting UDP traffic back into the input path from an outboundpath, or it risks re-entering, generating lock order problems, etc.It should be getting dropped into the netisr queue to be processedfrom the netisr context.

This problem would exist across all 5.4 installations, both i386 andamd64? Would it depend on heavy load, or could it theoreticallyhappen at any time when there's traffic? All three of my fbsd5servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencingrandom hangs with ~a few weeks between, impression is that if runningsingle-cpu mode they are all stable. All using dummynet in acomparable manner. Ideas?

Is it possible to configure dummynet out of your configuration, andsee if the problem goes away?


I'm running a test right now, will let you know in the morning.

Robert N M Watson
KDB trace:

db> trace
Tracing pid 56 tid 100064 td 0xc1a18600
kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30
siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1+0xe7
siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78
intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) atintr_execute_handlers+0x88
lapic_handle_intr(34) at lapic_handle_intr+0x3a
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp =0xd5480818 ---
_mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0
udp_input(c2d40000,14,c1a99000,1,0) at udp_input+0x257
ip_input(c2d40000,0,0,0,0) at ip_input+0x590
transmit_event(c1c64100,20940000,0,c1d58a80,7f4220) attransmit_event+0x107ready_event_wfq(c1c64100,20940000,0,c1d58a80,c06d860a) atready_event_wfq+0x511
dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519
ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1
pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) atpfil_run_hooks+0x138
ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593
udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597
udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30
sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1
nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9
nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342
nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) atnfs_writerpc+0x2a0
nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508
nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db
fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 ---
I cannot seem to kill process 56 (nfsiod), so I have to reset thebox.
Anyone got a clue? What can I do to ease debugging here? Next timeit happens I can probably make a dump, at least I will have adebug kernel running then.
/Eirik
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-[EMAIL PROTECTED]"


_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS-related hang in 5.4?

Reply via email to