On Wed, 2007-12-12 at 18:52 +0300, Anatoly Oreshkin wrote:
> 
> In this case read test reads a number of files and then hangs on some file.
> The command dmesg issued on client node gives such error messages:
> 
> LustreError: 5017:0:(socklnd.c:1599:ksocknal_destroy_conn()) Completing 
> partial
> receive from [EMAIL PROTECTED], ip 85.142.10.197:988, with error

I'm not really sure the origin or meaning of this message but it seems
pretty clear.  This looks like a networking issue.

...
> hw tcp v4 csum failed
> hw tcp v4 csum failed

And this makes it look even more like a networking issue.

> Dmesg issued on head node gives errors:
> 
> LustreError: 15048:0:(ost_handler.c:821:ost_brw_read()) @@@ timeout on bulk 
> PUT
>   [EMAIL PROTECTED] x4566962/t0 
> o3->[EMAIL PROTECTED]:-1 lens 
> 384/336 ref 0 fl Interpret:/0/0 rc 0/0
> Lustre: 15048:0:(ost_handler.c:881:ost_brw_read()) vtrak1fs-OST0000: ignoring 
> bulk IO comm error with 
> [EMAIL PROTECTED] id 
> [EMAIL PROTECTED] - client will retry
> Lustre: 14987:0:(ldlm_lib.c:519:target_handle_reconnect()) vtrak1fs-OST0000: 
> 629198c9-085d-f95a-462f-b5e535904a3d reconnecting

All more indications of networking issues.

I think you need to start debugging your network.

b.


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to