Hi all, I agree regarding lustre recovery, it works just great in practice. After prolonged OSS downtime, though, you may notice jobs reaching their time limits, i.e. jobs are blocked on I/O and killed by the scheduler before they actually complete and write final results. With SLURM, for example, you could consider using scontrol suspend/resume during OSS downtime, that sends STOP/CONT signals to processes and appropriately holds job runtime.
All the best, Stephane > On Feb 19, 2016, at 1:22 PM, Stearman, Marc <[email protected]> wrote: > > I agree with Oleg. All of our file systems are configured with OSS nodes in > failover pairs, and if one node dies, lustre will run on the backup node > quite well. Occasionally, though we have to do a repair on the underlying > storage, in which case we power down both OSS nodes, and do the repairs. > This usually takes less than 15 mintues, but we have had times where both > nodes are down for an hour or more. All I/O destined for those OSTs will > hang until they are back online, and usually recovery completes fine and > replays all the data. This is with 4000+ clients connected to the file > systems. > > Note that any clients that reboot or crash while those OSTs are offline will > not be recoverable, but any clients that stay up through the entire repair > window should pause and then recover once the hardware has been fixed. You > should not have to kill or STOP any processes using the file system. > > -Marc > > ---- > D. Marc Stearman > Lustre Operations Lead > [email protected] > Office: 925-423-9670 > Mobile: 925-216-7516 > > > > >> On Feb 19, 2016, at 12:11 PM, Drokin, Oleg <[email protected]> wrote: >> >> Hello! >> >> Actually I have to disagree. >> If the servers go down, but then go up and complete the recovery >> succesfully, the locks would be replayed and it all should work >> transparently. >> Clients would 'pause" trying to access those servers for as long as needed >> until the servers come back again. >> >> Also, file descriptors is something between MDS and clients so if an OST >> goes down, file descriptors would not be affected. >> >> That said, leaving MDS up while some OSTs go down for potentially prolonged >> time is not that great of an idea and it might make sense to deactivate >> those OSTs on MDS (before bringing OSTs down) >> (and reactivate them once they are back). >> >> Bye, >> Oleg >> On Feb 19, 2016, at 2:53 PM, Patrick Farrell wrote: >> >>> Paul, >>> >>> I would say this is not very likely to work and could easily result in >>> corrupted data. With the servers going down completely, the clients will >>> lose the locks they had (no possibility of recovery with the servers down >>> completely like this), and any data not written out will be lost. You can >>> guarantee the processes are idle with SIGSTOP, yes, but you can't guarantee >>> all of the data has been written out. >>> >>> There are other possible issues as well, but I don't think it's necessary >>> to detail them all. I would strongly advise against this plan - Just truly >>> stop activity on the clients and unmount Lustre (to be certain), then >>> remount it after the maintenance is complete. >>> >>> - Patrick >>> On 02/19/2016 01:45 PM, Paul Brunk wrote: >>>> Hi all: >>>> >>>> We have a Linux cluster (CentOS 6.5, Lustre 1.8.9-wcl) which mounts a >>>> Lustre FS from CentOS-based server appliance (Lustre 2.1.0). >>>> >>>> The Lustre cluster has 4 OSSes as two failover pairs. Due to bad luck >>>> we have one OSS unbootable, and replacing it will require taking its >>>> live partner down too (though not any of the other Lustre servers). >>>> >>>> We can prevent I/O to the Lustre FS by suspending (kill -STOP) the >>>> user processes on the cluster compute nodes before the maintenance >>>> work, and resuming them (kill -CONT) afterwards. >>>> >>>> I don't know what would happen, though, in those cases where the >>>> STOP'd process has an open file decriptor on the Lustre FS. If the >>>> relevant OSS/OSTs become unavailable, and then available again, during >>>> the STOP'd time, what would happen when the process is CONT'd? >>>> >>>> I tried a Web search on this, but the best I could find was stuff >>>> which assumed that one of a failover partner set would remain >>>> available. or was specifially about evictions (which I guess are a >>>> risk of this maintenance prccedure anyway). I did find one doc ( >>>> http://wiki.lustre.org/Lustre_Resiliency:_Understanding_Lustre_Message_Loss_and_Tuning_for_Resiliency >>>> >>>> )which suggested that silent data corruption was a possibility in the >>>> event of evictions. >>>> >>>> But what about non-evicted clients with open filehandles? >>>> >>>> Thanks for any insight! >>>> >>> >>> _______________________________________________ >>> lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> _______________________________________________ >> lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
