David: On 3/14/07, David Vasil <[EMAIL PROTECTED]> wrote:
What are people doing for failover (at the lustre layer) under 1.4.X series lustre? Specifically the failing of OSTs between a failed host and its failover pair.
The failover bit is easily controlled via lconf and grouping of nodes. The issue, as you list it further on: Under 1.4.9 I have found that the --group feature to lconf does not
appear to work. Likewise I have had issues with "lconf --cleanup --force --service <ost> <config file>" trying to unload all of lustre modules on a running OSS (which leaves the OSS in somewhat of a bad state).
is exactly what I am facing as well. Recently on a 1.4.9 cluster while trying to 'fail-back' an ost to the primary oss, the secondary oss which had taken over services refused to give them up. Unfortunately time on that cluster was limited for me and I am relegated to setting 1.4.9 up on a few new systems to carry on testing. I will update you (and all) within 2 days hopefully. If anyone else can pipe in what David and myself may be doing wrong given the lconf commands listed above, it would greatly help. Regards, -- Mustafa A. Hashmi [EMAIL PROTECTED]
_______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
