On Wed, 2007-12-12 at 18:52 +0300, Anatoly Oreshkin wrote: > > In this case read test reads a number of files and then hangs on some file. > The command dmesg issued on client node gives such error messages: > > LustreError: 5017:0:(socklnd.c:1599:ksocknal_destroy_conn()) Completing > partial > receive from [EMAIL PROTECTED], ip 85.142.10.197:988, with error
I'm not really sure the origin or meaning of this message but it seems pretty clear. This looks like a networking issue. ... > hw tcp v4 csum failed > hw tcp v4 csum failed And this makes it look even more like a networking issue. > Dmesg issued on head node gives errors: > > LustreError: 15048:0:(ost_handler.c:821:ost_brw_read()) @@@ timeout on bulk > PUT > [EMAIL PROTECTED] x4566962/t0 > o3->[EMAIL PROTECTED]:-1 lens > 384/336 ref 0 fl Interpret:/0/0 rc 0/0 > Lustre: 15048:0:(ost_handler.c:881:ost_brw_read()) vtrak1fs-OST0000: ignoring > bulk IO comm error with > [EMAIL PROTECTED] id > [EMAIL PROTECTED] - client will retry > Lustre: 14987:0:(ldlm_lib.c:519:target_handle_reconnect()) vtrak1fs-OST0000: > 629198c9-085d-f95a-462f-b5e535904a3d reconnecting All more indications of networking issues. I think you need to start debugging your network. b. _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
