Mag Gam wrote: > http://lists.lustre.org/pipermail/lustre-discuss/2009-March/009928.html > > Look familiar? > Yes, I've read the thread - that's why I addressed you in addition to the list ;-)
But I was not aware that this is supposed to be a bug in this particular Lustre version. Right now the MDT stops cooperating without any ll_mdt processes going up. Load is 0.5 or so on the MDT but no connections possible. In the log I only noted some "still busy with 2 active RPCs" messages. I just hope I don't have to writeconf the MDT again - I learned on this list that this would be necessary if these RPCs are never finished. Regards, Thomas > > On Fri, Jul 3, 2009 at 7:32 AM, Thomas Roth<[email protected]> wrote: >> Hi, >> >> I didn't take notice of a discussion of such problems with 1.6.7.1.  Do >> you know something more specific about it? We won't want to downgrade >> since our users are happier after the last upgrade (1.6.5 -> 1.6.7). And >> we don't have the 1.6.7.2 (Debian-) packages yet. But I could try to >> speed that up and force an upgrade if you told me that 1.6.7.1 wasn't >> really reliable. >> >> For the moment the problem seems to have been fixed by shutdown, >> fs-check and writeconf of all servers. >> However, I don't want to do that every other week ... >> >> Thanks a lot for your help, >> Thomas >> >> Mag Gam wrote: >>> Hi Tom: >>> >>> There was a known issue with 1.6.7.1. What I did was downgrade to >>> 1.6.6 and everything worked well. Or you can try upgrading, but there >>> is something def wrong with that version... >>> >>> If you like, I can help you offline. I should be free this weekend (I >>> have a long weekend) >>> >>> >>> >>> On Thu, Jul 2, 2009 at 8:22 AM, Thomas Roth<[email protected]> wrote: >>>> Hi all, >>>> >>>> our MDT gets stuck and unresponsive with very high loads (Lustre >>>> 1.6.7.1, Kernel 2.6.22, 8 Core, 32GB RAM). The only thing calling >>>> attention is one ll_mt_?? process running with 100% cpu. Nothing unusual >>>> happening on the cluster before that. >>>> After reboot as well as after moving the service to another server, this >>>> behavior reappears. The initial stages - mounting MGS, mouting MDT, >>>> recovery - work fine, but then the load goes up and the system is >>>> rendered unusable. >>>> >>>> Atm, I don't know what to do, except shutting down all servers and >>>> possible do a writeconf everywhere. >>>> >>>> I see that a similar problem was reported by Mag in March this year, but >>>> no clues or solutions appeared. >>>> Any ideas? >>>> >>>> Yours, >>>> Thomas >>>> >> -- >> -------------------------------------------------------------------- >> Thomas Roth >> Department: Informationstechnologie >> Location: SB3 1.262 >> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986 >> >> GSI Helmholtzzentrum für Schwerionenforschung GmbH >> Planckstraße 1 >> D-64291 Darmstadt >> www.gsi.de >> >> Gesellschaft mit beschränkter Haftung >> Sitz der Gesellschaft: Darmstadt >> Handelsregister: Amtsgericht Darmstadt, HRB 1528 >> >> Geschäftsführer: Professor Dr. Horst Stöcker >> >> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph, >> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt >> -- -------------------------------------------------------------------- Thomas Roth Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1 D-64291 Darmstadt www.gsi.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Darmstadt Handelsregister: Amtsgericht Darmstadt, HRB 1528 Geschäftsführer: Professor Dr. Horst Stöcker Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph, Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
