>> LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101ae084000 >> x1358858531428366/t60136289752 >> o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297285890 ref 2 fl Interpret:R/0/0 rc 0/0 > > One line before that there should be the actual RPC error specified that we need to know what happened.
Nope, just that error repeated: Feb 9 03:19:26 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101aaa3dc00 x1358858525376456/t60135184183 o4->[email protected]@o2ib:6/4 lens 464/608 e 0 to 1 dl 1297246810 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 03:29:56 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101aaa3d000 x1358858525468762/t60135184397 o4->[email protected]@o2ib:6/4 lens 464/608 e 0 to 1 dl 1297247403 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 03:40:22 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101aaa3c400 x1358858525557912/t60135184598 o4->[email protected]@o2ib:6/4 lens 464/608 e 0 to 1 dl 1297248029 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 03:51:18 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101afef2400 x1358858525655268/t60135392181 o4->[email protected]@o2ib:6/4 lens 464/608 e 0 to 1 dl 1297248685 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 04:01:40 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101aaa3dc00 x1358858525738536/t60135185019 o4->[email protected]@o2ib:6/4 lens 464/608 e 0 to 1 dl 1297249307 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 04:12:04 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101aaa3c400 x1358858525822214/t60135185246 o4->[email protected]@o2ib:6/4 lens 464/608 e 0 to 1 dl 1297249931 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 10:48:28 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101addda800 x1358858527540672/t60134973305 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297273752 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 10:49:51 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101adf3c800 x1358858527567804/t60134976801 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297273835 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 10:52:22 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101adddb000 x1358858527619100/t60134983332 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297273986 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 10:57:23 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101addda000 x1358858527728677/t60134998617 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297274250 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 11:07:23 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101adddbc00 x1358858527926588/t60135043030 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297274887 ref 2 fl Interpret:R/0/0 rc 0/0 the above is from 'less /var/log/messages', not some false negative by greping for osc_brw or lustre etc from the logs. In addition to the above I also see this sequence: Feb 9 11:57:41 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101add34000 x1358858528880660/t64430183909 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297277905 ref 2 fl Interpret:R/0/0 rc 0/0 to 1 dl 1297278471 ref 2 fl Interpret:R/0/0 rc 0/0 Feb 9 12:07:42 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) Skipped 1000 previous si milar messages Feb 9 12:15:12 nm-post-2 kernel: LustreError: 400:0:(osc_request.c:1143:can_merge_pages()) is it ok to have flags 0xc20 a nd 0x420 in the same brw? Feb 9 12:15:12 nm-post-2 kernel: LustreError: 400:0:(osc_request.c:1143:can_merge_pages()) Skipped 43 previous similar me ssages Feb 9 12:15:50 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1143:can_merge_pages()) is it ok to have flags 0xc20 and 0x420 in the same brw? Feb 9 12:15:50 nm-post-2 kernel: LustreError: 3935:0:(osc_request.c:1143:can_merge_pages()) Skipped 1 previous similar me ssage >> which in turn appears to generate a premature EOF on our user software. > > Actually this message does what it does - resends the request, so the userspace should not notice > any problems. On the other hand if any other requests aside from brw requests fail, they might not > get the resending benefit and cause userspace-visible errors. I glanced at the source and my initial impression was what you just said, that this is an internal retry, on the other hand there seems to be a tight correlation between these messages and the user space EOF occurrences. Thanks for the quick response. james > Bye, > Oleg > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
