I have a fairly simple lustre environment that consists of a single MDS and 2 OSS's each with 4 OST's. The servers and clients are all running Lustre 1.8.5 under RHEL 5.5, RPM's downloaded from lustre.
Normally I've had no problems but recently I have multiple clients reporting the following error: LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8101ae084000 x1358858531428366/t60136289752 o4->[email protected]@o2ib:6/4 lens 448/608 e 0 to 1 dl 1297285890 ref 2 fl Interpret:R/0/0 rc 0/0 which in turn appears to generate a premature EOF on our user software. There are no corresponding errors on the servers. I seem to only see this error on clients connected via QDR infiniband though that may be a false lead. In addition the problem seems more prevalent under load. Lastly it seems to be getting worse, almost as if there's some garbage collection issue on the clients. I've done some searching and don't see reports involving that routine. It seems like a timeout of some sort. Any hints as to what this error indicates as a problem ? james _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
