Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11327

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9515|review?([EMAIL PROTECTED]|review-
               Flag|m)                          |


(From update of attachment 9515)
> void class_fail_export(struct obd_export *exp)
> {
>-        int rc, already_failed;
>+        int rc, already_failed, connecting;
> 
>         spin_lock(&exp->exp_lock);
>         already_failed = exp->exp_failed;
>-        exp->exp_failed = 1;
>+        connecting = exp->exp_connecting;
>+        if (!connecting)
>+                exp->exp_failed = 1;
>         spin_unlock(&exp->exp_lock);

The only objection that I have here is that this appears to allow a
reconnecting client to stifle the effort of the server to evict the client.  I
can understand that if the reconnection has already started and it is holding
the locks then we can't really interrupt it.  However, I wonder if we shouldn't
mark the export failed anyways and handle this more gracefully in
target_handle_connect() instead of LBUGging?

What about just returning -ENOTCONN where we currently LBUG if we can't look up
the connection handle?  If that lookup succeeds, but the export is failed after
that, the current code at least has a refcount on the export and it will be
destroyed after we (confusingly) return success for the connection and drop the
reference held by req->rq_export.  We could also set the rq_status == -ENOTCONN
right at the end of target_handle_connect() if exp_failed is set, but then
there is still a very small window after that check where the client will
return success but the export is destroyed.


>@@ -627,22 +627,35 @@ int target_handle_connect(struct ptlrpc_
>+                        spin_lock(&export->exp_lock);
>+                        if (export->exp_failed) { /* bug 11327 evict/conn 
>race */
>+                                spin_unlock(&export->exp_lock);
>+                                CWARN("%s: exp %p connect racing with "
>+                                      "eviction\n", export->exp_obd->obd_name,
>+                                      export);
>+                                export = NULL;
>+                                rc = -EALREADY;
>+                                break;
>+                        }

Specifically, this check could be moved to the very end of the "success" path. 
I'm also not sure if -EALREADY is right here, instead of -ENOTCONN.

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to