Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11327



Created an attachment (id=9514)
Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9514&action=view)
proposed patch against 1.4.8-5chaos

Proposed fix to resolve connection / eviction race.  Here's the basic
race full details can be seen in the attached kernel dk log.

ping_evictor                            ll_ost_io_77
---------------------------------------------------------------------
- ping_evictor_main()
- timeout evict client
- exp->exp_fail = 1
                                        - target_handle_connect()
                                        - valid (but failed) export found 
                                          in                                   

                 target->obd_exports
                                        - exp->exp_connecting = 1
- class_disconnect destroys cookie
                                        - export = class_conn2export(&conn)
                                          lookup now fails due, no cookie.
                                        - LASSERT(export != NULL)


The proposed fix checks for exp_fail to be set in the export when it's
scanned for in target->obd_exports.  If its set we bail because this
export will soon be destroyed.  This patch also ensures that if an export
is successfully found and exp_fail is not set that exp_connecting will
be set under the exp_lock spin_lock.  This can then be used to prevent
class_fail_export() from setting the exp_fail and destroying the export
when the connect is in progress.

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to