Hi Jan,
Jan Harkes wrote:
What happens is that the client detects whether a server is up or down
based on the existence of a callback connection. So when the client
sends a probe, the server pings back on the callback connection.
Yepp, seen that from the log
However backfetches are using the same connection, and your backfetch is
taking very long. So the server is unable to send the ping back to the
client. This shouldn't be a problem because the server should be
responding with RPC2_BUSY which will make the client wait an extra 15
seconds or so. I guess at some point the client did give up, returned
ETIMEDOUT and disconnected.
Hmm. How is this RPC2_BUSY supposed to work.
Should this ObtainWriteLock somehow timeout? Which thread should
send the BUSY reply?
As far as I can see, the client sends the Probe several times on
the RPC2 level, without getting *any* reply.
And the Server itself also 'looses faith' in the client, some
seconds after hanging in this lock.
It drops the callback-conn then.
long, it should have been broken up by the client.
This is actually a
known bug (introduced somewhere between 6.0.9 and 6.0.12), the fix is
fairly simple, just removing an unnecessary test. I've attached the
patch.
OK, will try that.
And arguably, the client shouldn't even have to probe the server because
clearly there is still traffic between the two. But that is more of an
optimization and not really a correctness issue.
Hmm: I don't know. In my tests I explicitly stimulated the
'Probe' by running
cfs cs
But Venus itself pings also automatically after a certain time (150s),
and this will take down the connection also. Therefore monitoring
the activity on the client-server connection for all activity
and stopping explicit probes, when venus received other traffic seems
to be logical.
This is actually not a reintegration write lock, this is caused by the
fact that there is only a single RPC2 connection from the server to the
client, so it can only do one thing at a time. Fetch a file, or send a
callback probe.
You mentioned, that clients should break up their store OPs.
Can they break up the transfer-size below file-size?
Or do they have to transmit at least one file completely.
I think this matters for the ISO-Images.
Thanks for your help
Martin
--
+-[Martin Ginkel]-------------[mailto:mginkel(at)mpi-magdeburg.mpg.de]-+
| MPI Magdeburg, Zi S2.09 Sandtorstr. 1, D-39106 Magdeburg, Germany |
| What is this talk of 'release'? We are Klingons. Our software |
| 'escapes' leaving a bloody trail of designers and quality assurance |
+-[tel/fax: +49 391 6110 482/529]----[http://www.mpi-magdeburg.mpg.de]-+