Alex Lee wrote: > I been seeing something that looks like IB timeout errors lately after > upgrading to 1.6.5.1 using the supplied ofed kernel drivers. > > From what I can tell there hasnt been any real network issues that was > apparent. Are these errors just typical if the network is busy?
Could be. If you are having actual IB issues, there is generally an error from IB prior to any Lustre errors. (And generally lots of LustreErrors - we really get unhappy when your network breaks) What we see here are two requests timing out, one with a 6sec limit, one 20 sec. No indication of a related error, just a timeout exceeded. These timeouts can happen on a busy network. If they are frequent, you should increase obd_timeout. cliffw > > > Heres from MDS: > Sep 8 16:04:19 lustre-mds-0-0 kernel: LustreError: > 10001:0:(events.c:55:request_out_callback()) @@@ ty > pe 4, status -5 [EMAIL PROTECTED] x15310157/t0 > o104-><F0>'<84><88><FF><FF><FF><FF>[EMAIL PROTECTED] > c1d08_UUID:15/16 lens 232/256 e 0 to 6 dl 1220857463 ref 2 fl Rpc:N/0/0 > rc 0/0 > Sep 8 16:04:19 lustre-mds-0-0 kernel: Lustre: Request x15310157 sent > from lfs-MDT0000 to NID 10.12.29. > [EMAIL PROTECTED] 2s ago has timed out (limit 6s). > Sep 8 16:04:23 lustre-mds-0-0 kernel: Lustre: Request x15310047 sent > from lfs-MDT0000 to NID 10.12.28. > [EMAIL PROTECTED] 6s ago has timed out (limit 6s). > Sep 8 16:04:43 lustre-mds-0-0 kernel: LustreError: > 10003:0:(events.c:55:request_out_callback()) @@@ ty > pe 4, status -5 [EMAIL PROTECTED] x15310096/t0 > o104-><F0>'<84><88><FF><FF><FF><FF>[EMAIL PROTECTED] > c1d0e_UUID:15/16 lens 232/256 e 0 to 6 dl 1220857463 ref 1 fl > Complete:XN/0/0 rc 0/0 > Sep 8 16:05:07 lustre-mds-0-0 kernel: LustreError: > 3930:0:(events.c:55:request_out_callback()) @@@ typ > e 4, status -113 [EMAIL PROTECTED] x15310047/t0 > o104-><F0>'<84><88><FF><FF><FF><FF>[EMAIL PROTECTED] > 0c1ceb_UUID:15/16 lens 232/256 e 0 to 6 dl 1220857463 ref 1 fl > Complete:XN/0/0 rc 0/0 > Sep 8 16:08:44 lustre-mds-0-0 kernel: Lustre: Skipped 1 previous > similar message > Sep 8 16:13:24 lustre-mds-0-0 kernel: Lustre: Skipped 4 previous > similar messages > > On the OSS: > Sep 9 00:24:55 lustre-oss-4-1 kernel: Lustre: Skipped 3 previous > similar messages > Sep 9 00:25:01 lustre-oss-4-1 kernel: Lustre: Request x784766 sent from > lfs-OST0039 to NID 10.12.29.7@ > o2ib 20s ago has timed out (limit 20s). > Sep 9 00:25:31 lustre-oss-4-1 kernel: LustreError: > 13228:0:(o2iblnd_cb.c:2874:kiblnd_check_conns()) Ti > med out RDMA with [EMAIL PROTECTED] > Sep 9 00:25:31 lustre-oss-4-1 kernel: LustreError: > 13228:0:(events.c:55:request_out_callback()) @@@ ty > pe 4, status -103 [EMAIL PROTECTED] x784766/t0 o104->@:15/16 lens > 232/256 e 0 to 20 dl 1220887501 r > ef 1 fl Complete:XN/0/0 rc 0/0 > > -Alex > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
