Hi everybody We just hit the same problem last evening. The OSTs were suddenly disconnecting from the OSS.
I saw that we have manually limited the number of OSS threads to 128 while we are exporting 4 OSTs on that server and the file system is mounted by about 100 clients. I think this may be an issue? Could you find you're reason for the errors? I will now remove this thread limitation and see if this helps. Kind regards Reto Gantenbein On Aug 13, 2008, at 3:39 PM, Alex Lee wrote: > I have a system thats been spitting out OST disconnect messages under > heavy load. I'm guessing the OST eventually reconnects. > I want to say this happens when the OSS is extremely overloaded but I > did notice this happening even under light load. Only the OSS seems to > spit out any error messages. I dont see anything on the client side. > > Should I be concern? Or does this typically happen on other sites too? > > -Alex > > clip off one of the OSS: > > Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 137-5: UUID > 'lfs-OST0004_UUID' is not available for connect (no target) > Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: > 11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error > (-19) [EMAIL PROTECTED] > fff8101f4570600 x54/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl > 1218616308 > ref 1 fl Interpret:/0/0 rc -19/0 > Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: > 11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 3 previous > similar messag > es > Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: Skipped 3 previous > similar messages > Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 137-5: UUID > 'lfs-OST0004_UUID' is not available for connect (no target) > Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: > 10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error > (-19) [EMAIL PROTECTED] > fff81010fc86600 x50/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl > 1218617636 > ref 1 fl Interpret:/0/0 rc -19/0 > Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: > 10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous > similar messag > e > Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: Skipped 1 previous > similar message > Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 137-5: UUID > 'lfs-OST0005_UUID' is not available for connect (no target) > Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: > 11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error > (-19) [EMAIL PROTECTED] > fff81022861b400 x49/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl > 1218621159 > ref 1 fl Interpret:/0/0 rc -19/0 > Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: > 11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous > similar messag > e > > Different OSS: > Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 137-5: UUID > 'lfs-OST0050_UUID' is not available for connect (no target) > Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: > 13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error > (-19) [EMAIL PROTECTED] > fff8103d3b79a00 x124/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl > 1218539929 ref 1 fl Interpret:/0/0 rc -19/0 > Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: > 13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous > similar messag > e > Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous > similar message > Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 137-5: UUID > 'lfs-OST004f_UUID' is not available for connect (no target) > Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: > 13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error > (-19) [EMAIL PROTECTED] > fff8103d3e92a00 x125/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl > 1218539935 ref 1 fl Interpret:/0/0 rc -19/0 > Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: > 13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous > similar messag > e > Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous > similar message > Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 137-5: UUID > 'lfs-OST004f_UUID' is not available for connect (no target) > Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: > 28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error > (-19) [EMAIL PROTECTED] > fff8103d3983c00 x125/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl > 1218539938 ref 1 fl Interpret:/0/0 rc -19/0 > Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: > 28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 5 previous > similar messag > es > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Universität Bern Abt. Informatikdienste Gruppe Zentrale Systeme Reto Gantenbein Administrator UBELIX Gesellschaftsstrasse 6 CH-3012 Bern Raum -104 Tel. +41 (0)31 631 87 97 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
