> Normally I've had no problems but recently I have multiple clients > reporting the following error: > > LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo > for recoverable error req@ffff8101ae084000 x1358858531428366/t60136289752 > o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl > 1297285890 ref 2 fl Interpret:R/0/0 rc 0/0 > > which in turn appears to generate a premature EOF on our user software. > > There are no corresponding errors on the servers.
The above is not true. There are apparently corresponding errors of the form: Feb 9 17:05:21 lustre-oss-1 kernel: LustreError: 2964:0:(ost_handler.c:1038:ost_brw_write()) client csum f00001, server csum 964d53e2 Feb 9 17:05:21 lustre-oss-1 kernel: LustreError: 2964:0:(ost_handler.c:1038:ost_brw_write()) Skipped 43 previous similar messages Feb 9 17:05:21 lustre-oss-1 kernel: LustreError: 168-f: lustre-OST0000: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-10.64.1.212@tcp inum 2981338/1802650709 object 8183950/0 extent [2384461824-2385510399] Feb 9 17:05:21 lustre-oss-1 kernel: LustreError: Skipped 43 previous similar messages Feb 9 17:05:21 lustre-oss-1 kernel: LustreError: 2964:0:(ost_handler.c:1100:ost_brw_write()) client csum f00001, original server csum 964d53e2, server csum now 964d53e2 Feb 9 17:05:21 lustre-oss-1 kernel: LustreError: 2964:0:(ost_handler.c:1100:ost_brw_write()) Skipped 43 previous similar messages Feb 9 17:10:22 lustre-oss-1 kernel: LustreError: 3035:0:(ost_handler.c:1038:ost_brw_write()) client csum f00001, server csum 180cd9bd Feb 9 17:10:22 lustre-oss-1 kernel: LustreError: 3035:0:(ost_handler.c:1038:ost_brw_write()) Skipped 63 previous similar messages Feb 9 17:10:22 lustre-oss-1 kernel: LustreError: 168-f: lustre-OST0000: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-10.64.1.212@tcp inum 2981338/1802650709 object 8183950/0 extent [4355784704-4356833279] Feb 9 17:10:22 lustre-oss-1 kernel: LustreError: Skipped 63 previous similar messages Feb 9 17:10:22 lustre-oss-1 kernel: LustreError: 3035:0:(ost_handler.c:1100:ost_brw_write()) client csum f00001, original server csum 180cd9bd, server csum now 180cd9bd Feb 9 17:10:22 lustre-oss-1 kernel: LustreError: 3035:0:(ost_handler.c:1100:ost_brw_write()) Skipped 63 previous similar messages The other OSS shows similar errors. We are doing mmap I/O and a search implies those errors are related to mmap I/O. I'm open to suggestions, in the meantime the userspace code can be switched from mmap to regular file I/O via an rc file so we'll try that and see if it at least makes the errors go away. James _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
