[Lustre-devel] More LND: error handling

John R. Dunning Mon, 11 Dec 2006 06:53:35 -0800

    From: "Eric Barton" <[EMAIL PROTECTED]>
    Date: Mon, 11 Dec 2006 11:06:23 -0000
    
[...]    
    The guiding principles for completion are...
    
    1. If you return success from lnd_send or lnd_recv, you must call
       lnet_finalize() within finite time.
    
Right, I got that part.


    2. You may only call lnet_finalize() when there is no longer any
       chance that the underlying network can touch (read or write) the
       payload buffer.

Yes.  Not surprisingly, that's the trickiest part.  But it's all stuff that we
control, so it can be done.
    
    3. The completion status on sends isn't critical.  Lustre only really
       needs to know that sending is over; knowing whether the send was
       good or not is really just icing on the cake (e.g. so that it
       doens't have to wait for a full timeout for an RPC reply if sending
       the request failed).

Ok.
    
    4. The completion status on receives is completely critical.  You may
       only return success if the sink buffer has been filled correctly.
    
Of course.
    
    From: Scott Atchley <[EMAIL PROTECTED]>
    Date: Mon, 11 Dec 2006 07:21:38 -0500
    
    
    Two other comments:
    
    1) Do not hold any locks when calling any lnet_ functions.

Yikes.  Yes.  I'm pretty sure I wasn't, but good to keep in mind.

Does that really mean no locks at all, or no locks that could turn into
recursive lock attempts due to lnet calling back in?  Are the lnet things
(which get called into by lnd) all non-blocking?
    
    2) Make sure you are _completely_ done with your buffer before  
    calling lnet_finalize(). I ran into a race condition where I called  
    lnet_finalize() then placed the rx or tx descriptor on my idle  
    queue. :-)

Yes, that would probably be a Bad Thing (tm).  

Thanks...

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

[Lustre-devel] More LND: error handling

Reply via email to