Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

Chris H. Tue, 30 Oct 2007 20:59:21 -0800

Quoting Kris Kennaway <[EMAIL PROTECTED]>:

Chris H. wrote:

Quoting Kris Kennaway <[EMAIL PROTECTED]>:

Clifton Royston wrote:

On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:

excerpt from this list titled: NFS == lock && reboot, that Iposted follows:
------8<---SNIP---8<-----SNIP-----8<-------
# uname -a
FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: FriJan 26 16:27:14 PST 2007
Greetings,
Does anyone know when NFS and friends will be working again? Ihaven't been ableto /safely/ use it from 4.8 on. I remember some talk on the listsometime ago andthen it seemed to be resolved, as the discussion ended. So Ithought it was
fixed. Seems not. :(

My scenario;
mount host off root:
mount script exec'd follows...

#!/bin/sh -
mount -t nfs host.domain.tld:/ /host
mount -t nfs host.domain.tld:/var /host/var

confirm mount...

# ls /host
.snap    COPYRIGHT    bin
...
usr    var    tmp

OK looks good...

# cp /path/to/approx/10Mb/file /host/path/to/dest/dir/

Fatal double fault
eis 0x0blah
eiblah blah0x
panic double fault
no dump device defined
rebooting in 15sec...

Hmmm... that's not good. :(

------8<---SNIP---8<-----SNIP-----8<-------

My final solution was to change the lines in /etc/rc.conf
from:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

to:
nfs_client_enable="YES"
nfs_reserved_port_only="YES"
nfs_server_enable="YES"
#rpc_lockd_enable="YES"
#rpc_statd_enable="YES"
rpcbind_enable="YES"
Making those changes ended the "Fatal double fault && reboot in15 seconds..."


  Thanks for this very timely mention!  The cluster of servers I am
about to upgrade from 4.8 <embarrassed cough> to 6.2 relies heavily on
NFS to an old Netapp.  If I have got to disable rpc_lockd and
rpc_statd, it's good to know that now!
   Can I ask, can anybody confirm that they're running 6.2 on NFS
successfully *with* lockd and statd?

Er, yes, of course it does. The old message he is quoting is boguson its own,

While I'll grant you that I haven't *yet* found/taken the time to create a
dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount
point to produce an *instantaneous* "Fatal double fault". I don't think it's
fair to label my original post entirely /bogus/ - especially in light of
the recent post I replied to. Which seems to have some very common ground.
I should probably mention that since my last posting (my original thread),
I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd
enabled. Yet none of them produce a "Fatal double fault". They are all
Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP
which has a single onboard nve.   They are all inter-connected via NFS.
I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I
had intended to use for NFS back-up's. But given the NFS issue I had with
it, it didn't seem to be the best solution. If anyone felt like throwing
me a "cheat sheet" for creating a dump device out of that drive and a
"quickie" for producing a backtrace. I'm sure I'd be better able to find
the required time to produce the required information. I'm sorry. It's
just that I'm a hundred million miles away from that right now. As I've
been building several large web applications, and their deadline is fast
approaching. FWIW I bounced all the servers today, and therefore have
recent /verbose/ dmesg's. Should any of the information they provide, be
of any help/use to anyone.

Take care. :)


http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

It's very unlikely NFS is relevant to the problem (which is what madeit bogus, together with the lack of debugging) and likely that nve isthe cause. The above URL explains in detail how to obtain thenecessary debugging to confirm this.


Kris

Thank you Kris,
I was recently able to find a small window in my workload. So I decided to
use it to provide the "non-bogus" ;) information needed. After reading:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
and:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
a few days ago, I was only unclear on one point in setting up the required

environment. So I posted my question to the list "dumpdev question(probably stupid)"

which Andrey V. Elsukov immediately responded to.

I'll be creating a Crash Dump in the next couple of days. So if it'snot already

abundantly clear that this is the first time I've attempted to produce this

information - now would be the perfect time to /enlighten/ me as toanything you

can think of that will ensure you get the information you're looking for. :)

Thank you again for your reply.

--Chris

--
panic: kernel trap (ignored)



_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

Reply via email to