>
> You wrote:
> > From: Kwon Oh-hoon <[EMAIL PROTECTED]>
> > Message-Id: <[EMAIL PROTECTED]>
> > Subject: [Q] Why salvaging server occurs frequently??
> > To: [EMAIL PROTECTED]
> > Date: Wed, 25 Mar 1998 13:58:15 +0000 (KST)
> > Content-Type: text/plain; charset=EUC-KR
> > Sender: [EMAIL PROTECTED]
> >
> >
> > We have three database servers on alpha_osf32 plaforms.
> > AFS Product version of these servers is afs3.4 5.38.
> > Our three database servers are also file servers.
> > Because of salvaging file server frequently in DB Server,
> > all users in our cell must stop doing work on almost everyday.
> >
> > In FileLog.old file, I found an error message "file assertion failed".
> > To solve this problem, we upgraded our database servers from afs3.4 4=
> > .35
> > to afs3.4 5.38. But, this error occured again.=20
> >
> > After using backup command "vos backupsys" for daily backup=20
> > of all volumes, I think this problem has occured.
> >
> > Log files are in ftp.transarc.com:/pub/afsps/ftp/pohang-univ :
> > FileLog, SalvageLog, FileLog.old, SalvageLog.old, core.file.fs
> >
> > Qustion 1) Why salvaging server occurs frequently in this case?
> > How can this error "file assertion failed" be solved?
> > Qustion 2) /vicepx/V0xxxxxxx.vol file may be removed manually.
> > The volume is not in VLDB and not removed by the command "vos zap=
> ".
>
> I assume you mean the files under:
> /afs/transarc.com/public/anon-ftp/pub/afsps/ftp/pohang-univ
> As you noted, the important message (why it failed) is:
> Assertion failed! file afsfileprocs.c, line 6016.
> To really be sure what this means, it's necessary to contact your
> transarc customer support representative. Assuming, however, that
> the build for "afs 3.4 5.38" contains this ident line in "fileserver":
> $Header:
>/afs/transarc.com/project/fs/dev/afs/3.4/.stage13/rcs/viced/RCS/afsfileprocs.c,v
>2.453 1997/09/26 19:08:18 chengjie Exp $
> then the assertion on line 6016 happens in the routine CopyOnWrite upon
> any read error, or any write error but ENOSPC happens. When this assertion
> happens, you should also have a core file for the "fileserver" process.
> The core dump will probably be named
> /usr/afs/logs/core.file.fs - or some such.
> You should probably rename it to something else before studying it; otherwise,
> it could be overwritten by another core dump. You can look at it with
> your favorite debugger (say, adb), with something like:
> # adb /usr/afs/bin/fileserver /usr/afs/logs/core.file.fs
> errno/D
> $c
> If errno was set by the read or write, then it is likely to be useful
> in terms of telling what the problem is. The $c will tell you where
> the assertion was that failed. If you don't see CopyOnWrite, then
> that may mean that some other assertion failed, and you will need to
> transarc for more clues about what went wrong. With some patience,
> it is also possible to determine what disk, and what volume were being
> updated, but you'll really want to have transarc do this for you.
> You can facilitate this by saving a copy of your core dump & the
> corresponding fileserver binary, somewhere where your transarc customer
> service representative can look at it.
>
> A likely cause is a disk error. In this case, you should find that errno
> is set to EIO. This will not be the only clue that there are problems.
> You should also find that there are messages on the console about disk
> read and write errors, and these messages should also be recorded in some
> file on the system (often /var/adm/messages, but check to be sure.)
> These messages should include the name of the disk that was failing,
> and the block number. If you do find these, it's well worth your while
> to fix this as soon as possible, before you lose much data and time.
> A simple way that will find many disk errors is to use "dd" from the raw
> or block device, to /dev/null. Any errors before the end of the disk
> are cause for alarm (an error at the *end* of the disk is acceptable; some
> Unix disk drivers return an error instead of EOF when this condition is
> hit). Sometimes, your system will also come with a disk diagnostic aid
> that can format the disk; fancier versions may contain additional tests
> such as a non-destructive sequential read, or a random seek read, or
> some sort of write/read surface certification routine. It is not a bad
> idea to run a write/read surface certification routine for a day or so
> before putting a new disk into service. Be careful -- some of those
> tests may erase data on the disk.
>
> -Marcus Watts
> UM ITD PD&D Umich Systems Group
>
I executed debugger as you said, but I could not get errno.
The result is as follows.
h2o:root 107 # gdb /usr/afs/bin/fileserver /usr/afs/logs.back.Mar.23.22/core.file.fs
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (alpha-dec-osf3.2), Copyright 1996 Free Software Foundation, Inc...
Core was generated by `fileserver'.
Program terminated with signal 6, IOT/Abort trap.
Reading symbols from /usr/shlib/libc.so...done.
#0 0x3ff801072d8 in __kill ()
at ../../../../../src/usr/ccs/lib/libc/alpha/kill.s:41
../../../../../src/usr/ccs/lib/libc/alpha/kill.s:41: No such file or directory.
(gdb)
And what is the file assertion?
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Kwon O-Hoon (�� ����)
POSTECH Computing Center Researcher
Personal E-Mail : [EMAIL PROTECTED]
Official E-Mail : [EMAIL PROTECTED], [EMAIL PROTECTED]
Homepage : http://www.postech.ac.kr/~dolphin
Telephone : +82-562-279-2540
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=