Hi Stefano, your client was evicted by the OST server, now you could explain if you have any failover configuration. To understand better the failover and the timeout issue you could read the manual at paragraph 4.3.8.
Ciao. On 05/19/2010 02:34 PM, Stefano Elmopi wrote: > > > Hi, > > I have a small problem but it certainly is the fault of the little > knowledge I have by the argument. > I have a Lustre file system with a node MGS/MDS, two nodes OSS and one > Client. > I launch a copy of a large file on Lustre and while the copy goes on, > I restart the node OSS that is handling the writing on the File System. > The copy process is put in the state -stalled- and when the node OSS > is back on, > I expected the copy process to resume normally, but instead crashes. > This is a log on the node MGS: > > May 19 13:43:43 mdt01prdpom kernel: Lustre: > 3827:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request > x1336168048230433 sent from lustre01-OST0000-osc to NID > 172.16.100....@tcp 17s ago has timed out (17s prior to deadline). > May 19 13:43:43 mdt01prdpom kernel: r...@ffff81012e11e400 > x1336168048230433/t0 o400->[email protected] > <mailto:[email protected]>@tcp:28/4 lens 192/384 e > 0 to 1 dl 1274269423 ref 1 fl Rpc:N/0/0 rc 0/0 > May 19 13:43:43 mdt01prdpom kernel: Lustre: lustre01-OST0000-osc: > Connection to service lustre01-OST0000 via nid 172.16.100....@tcp was > lost; in progress operations using this service will wait for recovery > to complete. > May 19 13:44:09 mdt01prdpom kernel: Lustre: > 3828:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request > x1336168048230435 sent from lustre01-OST0000-osc to NID > 172.16.100....@tcp 26s ago has timed out (26s prior to deadline). > May 19 13:44:09 mdt01prdpom kernel: r...@ffff81012e5f2000 > x1336168048230435/t0 o8->[email protected] > <mailto:[email protected]>@tcp:28/4 lens 368/584 e > 0 to 1 dl 1274269449 ref 1 fl Rpc:N/0/0 rc 0/0 > May 19 13:44:37 mdt01prdpom kernel: Lustre: > 3829:0:(import.c:517:import_select_connection()) lustre01-OST0000-osc: > tried all connections, increasing latency to 2s > May 19 13:44:37 mdt01prdpom kernel: LustreError: > 3828:0:(lib-move.c:2441:LNetPut()) Error sending PUT to > 12345-172.16.100....@tcp: -113 > May 19 13:44:37 mdt01prdpom kernel: LustreError: > 3828:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 > r...@ffff81012d3e5800 x1336168048230437/t0 > o8->[email protected] > <mailto:[email protected]>@tcp:28/4 lens 368/584 e > 0 to 1 dl 1274269504 ref 2 fl Rpc:N/0/0 rc 0/0 > May 19 13:44:37 mdt01prdpom kernel: Lustre: > 3828:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request > x1336168048230437 sent from lustre01-OST0000-osc to NID > 172.16.100....@tcp 0s ago has failed due to network error (27s prior > to deadline). > May 19 13:44:37 mdt01prdpom kernel: r...@ffff81012d3e5800 > x1336168048230437/t0 o8->[email protected] > <mailto:[email protected]>@tcp:28/4 lens 368/584 e > 0 to 1 dl 1274269504 ref 1 fl Rpc:N/0/0 rc 0/0 > May 19 13:45:33 mdt01prdpom kernel: Lustre: > 3829:0:(import.c:517:import_select_connection()) lustre01-OST0000-osc: > tried all connections, increasing latency to 3s > May 19 13:45:33 mdt01prdpom kernel: LustreError: > 3828:0:(lib-move.c:2441:LNetPut()) Error sending PUT to > 12345-172.16.100....@tcp: -113 > May 19 13:45:33 mdt01prdpom kernel: LustreError: > 3828:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 > r...@ffff81012e11e400 x1336168048230441/t0 > o8->[email protected] > <mailto:[email protected]>@tcp:28/4 lens 368/584 e > 0 to 1 dl 1274269561 ref 2 fl Rpc:N/0/0 rc 0/0 > May 19 13:45:33 mdt01prdpom kernel: Lustre: > 3828:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request > x1336168048230441 sent from lustre01-OST0000-osc to NID > 172.16.100....@tcp 0s ago has failed due to network error (28s prior > to deadline). > May 19 13:45:33 mdt01prdpom kernel: r...@ffff81012e11e400 > x1336168048230441/t0 o8->[email protected] > <mailto:[email protected]>@tcp:28/4 lens 368/584 e > 0 to 1 dl 1274269561 ref 1 fl Rpc:N/0/0 rc 0/0 > May 19 13:46:31 mdt01prdpom kernel: Lustre: > 3829:0:(import.c:517:import_select_connection()) lustre01-OST0000-osc: > tried all connections, increasing latency to 4s > May 19 13:46:31 mdt01prdpom kernel: LustreError: 167-0: This client > was evicted by lustre01-OST0000; in progress operations using this > service will fail. > May 19 13:46:31 mdt01prdpom kernel: Lustre: > 4099:0:(quota_master.c:1716:mds_quota_recovery()) Only 0/2 OSTs are > active, abort quota recovery > May 19 13:46:31 mdt01prdpom kernel: Lustre: lustre01-OST0000-osc: > Connection restored to service lustre01-OST0000 using nid > 172.16.100....@tcp. > May 19 13:46:31 mdt01prdpom kernel: Lustre: MDS lustre01-MDT0000: > lustre01-OST0000_UUID now active, resetting orphans > > is a timeout problem ?? > How can I change the timeout ? > > Thanks !!! > > > > Ing. Stefano Elmopi > Gruppo Darco - Resp. ICT Sistemi > Via Ostiense 131/L Corpo B, 00154 Roma > > cell. 3466147165 > tel. 0657060500 > email:[email protected] <mailto:[email protected]> > > "Ai sensi e per effetti della legge sulla tutela della riservatezza > personale > (D.lgs n. 196/2003), questa @mail e' destinata unicamente alle > persone sopra > indicate e le informazioni in essa contenute sono da considerarsi > strettamente > riservate. E' proibito leggere, copiare, usare o diffondere il > contenuto della > presente @mail senza autorizzazione. Se avete ricevuto questo > messaggio per > errore, siete pregati di rispedire la stessa al mittente. Grazie" > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- _Gabriele Paciucci_ http://www.linkedin.com/in/paciucci Pursuant to legislative Decree n. 196/03 you are hereby informed that this email contains confidential information intended only for use of addressee. If you are not the addressee and have received this email by mistake, please send this email to the sender. You may not copy or disseminate this message to anyone. Thank You. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
