>>>>> "Jan" == Jan Harkes <[EMAIL PROTECTED]> writes:
Jan> In any case 'errorcode 198' is EINCOMPATIBLE. It is returned
Jan> when we're trying to write (store) data to a file that has
Jan> been modified, i.e. the store-id of the original copy on the
Jan> client doesn't match the store-id of the file on the server.
Jan> Your server log should show a message similar to,
Jan> CheckStoreSemantics: (0x7f00002a.0x76a.0x44d0), VCP error
Jan> (198)
Not quite; I see
CheckStoreSemantics: (1000015.76a.44d0), VCP error (198)
where 1000015 is the volume ID that corresponds to the 7f00002a
replica ID (or do I have that backwards?).
Jan> It could be that this has made reintegration more susceptible
Jan> to failures. This is just one theory, but
Jan> 'volutil setlogparms <volume replica id> reson 4'
Jan> will turn resolution back on.
Urk, I'll try again, but with
volutil setlogparms 0x7f00002a reson 4
I got disconnected and the server crashed. :-(
The log shows lots of stats from the server followed by
19:46:40 done
19:46:50 VAllocFid: volume disk uniquifier being extended
19:46:50 ****** FILE SERVER INTERRUPTED BY SIGNAL 11 ******
19:46:50 ****** Aborting outstanding transactions, stand by...
19:46:50 Uncommitted transactions: 0
19:46:50 Uncommitted transactions: 0
19:46:50 Becoming a zombie now ........
19:46:50 You may use gdb to attach to 389
Restarting the codasrv shows 47 "unreachable" log entries, and the log
ends
20:30:52 Entering DCC(0x1000013)
20:30:52 done: 3378 files/dirs, 101945 blocks
20:30:52 SalvageIndex: Vnode 0x9e4 has no inodeNumber
20:30:52 SalvageIndex: Creating an empty object for it
20:30:52 Entering DCC(0x1000015)
20:30:52 MarkLogEntries: loglist was NULL ... Not good
Uh-oh ... the server crashed right there. Now what do I do? It
doesn't look like I can get it to start at all!
bash-2.05b$ cat /vice/srv/SrvErr
Assertion failed: 0, file "vol-salvage.cc", line 851
EXITING! Bye!
Why is inconsistent data for a single volume a fatal error? Couldn't
we just take that volume offline?
Jan> Another interesting fact is that the first entry in your CML
Jan> is a store. Perhaps the client got disconnected during the
Jan> connected store attempt, and this is essentially a replay of
Jan> an already committed operation.
I get that complaint from venus a lot on a (non-init) start after one
of these volume-specific write disconnects. This used to cause venus
to refuse to do anything with the volume involved, until the kernel
upgrade to 2.4.20. Now I can generally repair, begin INCOBJ,
discardalllocal, end, quit. However that volume is rather unstable
thereafter. I don't think I've ever successfully repaired one of
these conflicts.
And of course sometimes venus just refuses to admit that the conflict
exists when I ask, although the console shows it refusing to
reintegrate because of a conflict. Not even cvs ck always helps.
Jan> So we just 'invented' fake identifiers in the range from 0xea
Jan> to 0x27b. But the kernel is asking for fake objects that are
Jan> clearly out of this range. Which would indicate that the
Jan> directory data that the kernel is using is
Jan> incorrect/outdated. Possibly caused by a process that is
Jan> blocking a re-open by having it's cwd in the offending
Jan> location.
That wouldn't be surprising, because cvs very likely cd's into the
directory it's working on.
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.