Mag Gam wrote: > CliffW: > > This helps out a lot! > > We still have problems determining devices. We don't know what their > numbers are (I been using lctl dl), but I don't know how to activate > or deactivate them. > > > Do you have an example? > Yup http://manual.lustre.org/manual/LustreManual16_HTML/KnowledgeBase.html#50544717_84403
The .pdf version I think has more details. cliffw > > TIA > > On Thu, Aug 7, 2008 at 10:59 AM, Cliff White <[EMAIL PROTECTED]> wrote: >> Mag Gam wrote: >>> We do a lot of fluid simulations at my university, but on a similar >>> note I would like to know what the Lustre experts will do in >>> particular simulated scenarios... >>> >>> The environment is this: >>> 30 Servers (All Linux) >>> 1000+ Clients (All Linux) >>> >>> 30 Servers >>> 1 MDS >>> 30 OSTs each with 2TB of storage >>> >>> No fail over capabilities. >>> >>> >>> Scenario 1: >>> Your client is trying to mount lustre filesystem using lustre module, >>> and it hung. Do what? >> Answer 0 to all questions: >> "Read the Lustre Manual. File doc bugs in Lustre Bugzilla if there's a part >> you don't understand, or a part missing" >> >> Answer 1 for all your questions. >> "Check syslogs/consoles on the impacted clients. >> Check syslogs/consoles on _all lustre servers. >> Pay careful attention to timestamps. >> Work backwards to the first error." >> >> Is the problem restricted to one client or seen by multiple clients? >> If multiple clients, start with the network, use lctl ping to check lustre >> connectivity. >> If a single client, it's generally a client config/network config issue. >>> Scenario 2: >>> Your MDS won't mount up. Its saying, "The server is already running". >>> You try to mount it up couple of times and still its not >> Be certain the server is not already running. >> Be certain no hung mount processes exist. >> Unload all lustre modules (lustre_rmmod script will do this) >> Retry and -> answer 1 >> >>> Scenario 3: >>> OST/OSS reboots due to a power outage. Some files are striped on this, >>> and some aren't What happens? What to do for minimal outage? >> - Clients can be mounted with a dead OST using the exclude options to the >> mount command. lfs getstripe can be run from clients to find files >> on the bad OST. See answer 0 for detailed process. >>> Scenario 4: >>> lctl dl shows some devices in "ST" state. What does that mean, and how >>> do I clear it? >> ST = stopped. >> Clear this by cleaning up all devices (answer 0) >> or restarting the stopped devices. >> Usually indicates an error/issue with the stopped device, so see >> answer 1. >>> >>> I know some of these scenarios may be ambiguous, but please let me >>> know which so I can further elaborate. I am eventually planning to >>> wiki this for future reference and other lustre newbies. >> Please contribute to wiki.lustre.org - there is considerable information >> there already, and a decent existing structure. >>> If anyone else has any other scenarios, please don't be shy and ask >>> away. We can create a good trouble shooting doc similar to the >>> operations manual. >> Again, please file doc bugs at bugzilla.lustre.org and contribute to >> wiki.lustre.org, hope this helps! >> cliffw >> >>> >>> TIA >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
