From: "Eric Barton" <[EMAIL PROTECTED]>
Date: Mon, 11 Dec 2006 11:06:23 -0000
[...]
The guiding principles for completion are...
1. If you return success from lnd_send or lnd_recv, you must call
lnet_finalize() within finite time.
Right, I got that part.
2. You may only call lnet_finalize() when there is no longer any
chance that the underlying network can touch (read or write) the
payload buffer.
Yes. Not surprisingly, that's the trickiest part. But it's all stuff that we
control, so it can be done.
3. The completion status on sends isn't critical. Lustre only really
needs to know that sending is over; knowing whether the send was
good or not is really just icing on the cake (e.g. so that it
doens't have to wait for a full timeout for an RPC reply if sending
the request failed).
Ok.
4. The completion status on receives is completely critical. You may
only return success if the sink buffer has been filled correctly.
Of course.
From: Scott Atchley <[EMAIL PROTECTED]>
Date: Mon, 11 Dec 2006 07:21:38 -0500
Two other comments:
1) Do not hold any locks when calling any lnet_ functions.
Yikes. Yes. I'm pretty sure I wasn't, but good to keep in mind.
Does that really mean no locks at all, or no locks that could turn into
recursive lock attempts due to lnet calling back in? Are the lnet things
(which get called into by lnd) all non-blocking?
2) Make sure you are _completely_ done with your buffer before
calling lnet_finalize(). I ran into a race condition where I called
lnet_finalize() then placed the rx or tx descriptor on my idle
queue. :-)
Yes, that would probably be a Bad Thing (tm).
Thanks...
_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel