Not much spam and no gnutella, but probably a web browser open on
the weather page that updates occasionally.
IPsec speed should be ok; server is a PPro-200 and client a Pentium IV
2GHz, which should be able to keep up (doing AES and HMAC-SHA1).
Seriously, it's not the cpu speed, although there could be hiccups
during SA renegotiation (but I don't think so).
If global is a symlink to the volumeid, the client hasn't been able to
mount the volume. Perhaps it timed out during the volume lookup or
something. Another reason would be a server-server conflict, but as you
only have a single server that probably isn't the case.
This seemed to persist, even though I could do 'cfs lv'. My real
problem isn't that something bad happened, it's that I could not
recover.
Yup, norton is only a server thing. 'cfs fl' is in many cases an evil
operation, it flushes the data of cached objects in the specified
subtree. It shouldn't touch 'dirty' objects (i.e. the ones with
associated CML entries), but you never know.
So really 'cfs fl' should only discard cached data that is still in
the 'read-only' state, and thus should be safe at any time. If not,
it probably should be fixed.
cfs fl is mostly useful for debugging, it can be used to push an object
out of the cache, so that I can test or time the fetching of an object.
sure, so I was doing the wrong thing with it, but still it should not
have caused trouble.
The CML is an ordered log, and you cannot simply kill an entry within
the log. The only possible operations are by stepping through the
operations with 'repair', or 'cfs purgeml'. Maybe you could try
'removeinc', I know someone here tried to make that work for
local-global conflicts, but I'm not sure it ever worked.
I should have tried purgeml, but given that I had a 'local
inconsistent object' I bet it would not have worked. Perhaps there
should be some way to drop the head entry off the CML, which seems to
be analogous to 'discardlocal' in repair. So maybe there is, and the
problem is that entering repair mode can fail. In my book, entering
repair mode should succeed any time there is a conflict.
Ugh, these are local 'fake' fids, but during reintegration these should
have been replaced with correct global fids. Maybe your kernel is a bit
too agressive caching the directory contents, or the objects got lost as
a result of the cfs fl or failed repairs.
This is on NetBSD 1.6, but I have had similar experiences on FreeBSD.
Ok, so we have a CML entry to create a file named 'local', but the
client is unable to find the associated container file. That's a pretty
bad state right there. The name is typically the last name used to
access the object, I guess this got expanded as a file conflict and the
'local' directory entry was the last name used to access wi0.dump.
I never created anything called local. I am pretty sure this is from
the failed repair session.
> 08:56:56 Callback failed RPC2_DEAD (F) for ws [client-in-question]:65516
> 09:01:19 Callback failed RPC2_DEAD (F) for ws [client-in-question]:64967
These are a result of timeouts, for some reason the client is not
responding (or not receiving) rpc2 callback messages. As a result the
server will kill all incoming connections from that client.
This probably happened during congestion on the link. That's life,
and shouldn't cause lasting trouble (I'd expect going from WD to
disconnected, and then back when the server probe works, picking up
reintegrating).
Hey I've got 33k6, I'll throttle it for 28.8 for a while to see if I get
hit by similar problems.
I don't think that will make the difference. Mine is nominally 33.6,
but I get 28 or 26. It could be that the BSD kernel support is
buggy. That would be fair enough after all the trouble I've seen with
Linux kernels and coda over the years.
The server should deal with retried reintegrations, not sure why it
doesn't seem to do that in your case.
Do you mean
modification on client while WD
try to reintegrate
Backfetch times out, causing Store to fail
pause
try to retintegrate the same Store
Backfetch works this time
[successful store with no conflict]
is what you think happens with the current code?
My issue isn't
I get timeouts once in a while and go disconnected when I think I
shouldn't, but venus reintegrates stuff later so it's only annoying.
This would be an annoyance. It's more like
When I modify files in the client, and the client was WD to start with,
and no one else is writing to that volume, I end up with conflicts that
I cannot repair and I have to reinit venus.
which would cause me to stop using coda if I hadn't already integrated
into how I work.
What's the state of the realms branch and the future repair changes?
It seems like repair (venus's representation of stuff) is bletcherous
now. But, it may be that the problem is in the NetBSD kernel code.
I wonder if putting some more aggressive cache flushing into
venus/netbsd would help. I'd take not losing over performance
happily, and then we'd know where to fix. I admit I have assumed that
the problem is in venus, and that isn't necessairly clear.
Greg Troxel <[EMAIL PROTECTED]>