Quoting Oliver Fromme <[EMAIL PROTECTED]>:

Chris H. <[EMAIL PROTECTED]> wrote:
> Oliver Fromme wrote:
> > [...]
> > However, I don't think that your actual problem (lock-up
> > and panics) is related to rpc.lockd or rpc.statd.  It
> > rather sounds like something else is wrong with your
> > machine.  NFS works perfectly fine for me, including
> > copying huge files.
> >
> > You wrote that you had a lot of crashes that accumulated
> > many files in lost+found.  Well, maybe your filesystem
> > was somehow damaged in the process.  It is possible to
> > damage file systems in a way that can lead to panics, and
> > it's not necessarily detected and repaired by fsck.
>
> Indeed. I /too/ considered this. However, I largely dismissed this
> as a possibility as most all of them are 0 length in size. The others
> are fragments of logs. I'm not /completely/ ruling this out though.

The files in lost+found aren't the problem.  The problem
is the things that you cannot see, and fsck won't move
those to lost+found.

In particular, if you use softupdates on drives that have
write-caching enabled, or on drives that illegally cache
data even if it's disabled (be it intentionally or because
of bugs in the firmware), it's almost guaranteed that the
FS will take damage beyond repair on a crash, and even more
so after several crashes.

Another potential cause of problems is the background fsck
feature in FreeBSD 6.  I'm not sure if it has been fixed
in 6-stable, maybe it has.  I don't want to spread FUD.
But in the past, if a machine crashed and rebooted during
a background fsck, that was almost a guarantee for damage
beyond repair, too.  That's why I always disable background
fsck on my machines.  (Let me repeat:  It _might_ be fixed
in 6-stable, I don't know.  I haven't seen a definitive
confirmation of it being fixed on the mailing lists so
far.  If somebody knows otherwise, please correct me.)

Greetings, and thank you for your thoughtful reply.
Understood on all points. As mentioned; I wasn't /completely/
ruling that out. I have always refused to permit background fsck.
/Not/ because of any lack of faith I have in FBSD. Frankly, I
have nothing /but/ faith - perhaps more than I ought to. But
rather, because I insist on keeping tabs on what's going on
/at all times/. So, should the system crash/shutdown, or halt
for any reason; the BIOS will keep it in a "shutdown" state should
it gain control. In the case of a kernel reboot/crash; the loader
simply sits and awaits my confirmation before starting the system.
That way I am always guaranteed the opportunity to start in single
user mode and answer to any anomalies that the system reports with
an affirmative/negative.
So. In summary, I am /not/ completely ruling out your suggestion that
irreparable damage has been done as a result of the multitude of crashes
imposed upon it. I am also grateful for your taking the time to share
your experiences and insight with me. I simply haven't found anything
/definitive/ yet. Kris might argue here that NFS seems to be working
fine for everyone else, which would also add credence to your theory.
Both of you may indeed be correct. :)
I just think it'd be worth the time to follow through and make a dump
device and crash it to find the /definitive/ reason for this. It may
in fact turn out to be some obscure/near impossible anomaly in the NFS
code. That /I/ was just (un)lucky enough to stub my toe on. :)
At any rate, as this is a production server - and a /real/ busy one at
that; I want to get a (confirmed) good backup off of it before willingly
bashing it any further. It currently serves the largest Netscape browser
client archive on the net. They are all the 0.x - 4.x series browser
clients. You'd be amazed how popular/ how many people still use them.
So as backing it up onto the NFS mounted backup server is currently out
of the question, and there's more than a Terra byte of browser clients
alone, it's going to take me a little longer to follow through with the
dump device > crash > dump > back trace, than it would otherwise - but
it will be done. :)

Thank you again for taking the time to share your thoughts, suggestions
and experiences. I really appreciate it.

--Chris


Best regards
  Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"Python is an experiment in how much freedom programmers need.
Too much freedom and nobody can read another's code; too little
and expressiveness is endangered."
       -- Guido van Rossum
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




--
panic: kernel trap (ignored)



-----------------------------------------------------------------
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
/////////////////////////////////////////////////////////////////

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to