Hi all,

I agree regarding lustre recovery, it works just great in practice. After 
prolonged OSS downtime, though, you may notice jobs reaching their time limits, 
i.e. jobs are blocked on I/O and killed by the scheduler before they actually 
complete and write final results. With SLURM, for example, you could consider 
using scontrol suspend/resume during OSS downtime, that sends STOP/CONT signals 
to processes and appropriately holds job runtime.

All the best,

Stephane

> On Feb 19, 2016, at 1:22 PM, Stearman, Marc <[email protected]> wrote:
> 
> I agree with Oleg.  All of our file systems are configured with OSS nodes in 
> failover pairs, and if one node dies, lustre will run on the backup node 
> quite well.  Occasionally, though we have to do a repair on the underlying 
> storage, in which case we power down both OSS nodes, and do the repairs.  
> This usually takes less than 15 mintues, but we have had times where both 
> nodes are down for an hour or more.  All I/O destined for those OSTs will 
> hang until they are back online, and usually recovery completes fine and 
> replays all the data.  This is with 4000+ clients connected to the file 
> systems.
> 
> Note that any clients that reboot or crash while those OSTs are offline will 
> not be recoverable, but any clients that stay up through the entire repair 
> window should pause and then recover once the hardware has been fixed.  You 
> should not have to kill or STOP any processes using the file system.
> 
> -Marc
> 
> ----
> D. Marc Stearman
> Lustre Operations Lead
> [email protected]
> Office:  925-423-9670
> Mobile:  925-216-7516
> 
> 
> 
> 
>> On Feb 19, 2016, at 12:11 PM, Drokin, Oleg <[email protected]> wrote:
>> 
>> Hello!
>> 
>>  Actually I have to disagree.
>>  If the servers go down, but then go up and complete the recovery 
>> succesfully, the locks would be replayed and it all should work 
>> transparently.
>>  Clients would 'pause" trying to access those servers for as long as needed 
>> until the servers come back again.
>> 
>>  Also, file descriptors is something between MDS and clients so if an OST 
>> goes down, file descriptors would not be affected.
>> 
>>  That said, leaving MDS up while some OSTs go down for potentially prolonged 
>> time is not that great of an idea and it might make sense to deactivate 
>> those OSTs on MDS (before bringing OSTs down)
>>  (and reactivate them once they are back).
>> 
>> Bye,
>>   Oleg
>> On Feb 19, 2016, at 2:53 PM, Patrick Farrell wrote:
>> 
>>> Paul,
>>> 
>>> I would say this is not very likely to work and could easily result in 
>>> corrupted data.  With the servers going down completely, the clients will 
>>> lose the locks they had (no possibility of recovery with the servers down 
>>> completely like this), and any data not written out will be lost.  You can 
>>> guarantee the processes are idle with SIGSTOP, yes, but you can't guarantee 
>>> all of the data has been written out.
>>> 
>>> There are other possible issues as well, but I don't think it's necessary 
>>> to detail them all.  I would strongly advise against this plan - Just truly 
>>> stop activity on the clients and unmount Lustre (to be certain), then 
>>> remount it after the maintenance is complete.
>>> 
>>> - Patrick
>>> On 02/19/2016 01:45 PM, Paul Brunk wrote:
>>>> Hi all:
>>>> 
>>>> We have a Linux cluster (CentOS 6.5, Lustre 1.8.9-wcl) which mounts a
>>>> Lustre FS from CentOS-based server appliance (Lustre 2.1.0).
>>>> 
>>>> The Lustre cluster has 4 OSSes as two failover pairs. Due to bad luck
>>>> we have one OSS unbootable, and replacing it will require taking its
>>>> live partner down too (though not any of the other Lustre servers).
>>>> 
>>>> We can prevent I/O to the Lustre FS by suspending (kill -STOP) the
>>>> user processes on the cluster compute nodes before the maintenance
>>>> work, and resuming them (kill -CONT) afterwards.
>>>> 
>>>> I don't know what would happen, though, in those cases where the
>>>> STOP'd process has an open file decriptor on the Lustre FS. If the
>>>> relevant OSS/OSTs become unavailable, and then available again, during
>>>> the STOP'd time, what would happen when the process is CONT'd?
>>>> 
>>>> I tried a Web search on this, but the best I could find was stuff
>>>> which assumed that one of a failover partner set would remain
>>>> available. or was specifially about evictions (which I guess are a
>>>> risk of this maintenance prccedure anyway). I did find one doc (
>>>> http://wiki.lustre.org/Lustre_Resiliency:_Understanding_Lustre_Message_Loss_and_Tuning_for_Resiliency
>>>>  
>>>> )which suggested that silent data corruption was a possibility in the
>>>> event of evictions.
>>>> 
>>>> But what about non-evicted clients with open filehandles?
>>>> 
>>>> Thanks for any insight!
>>>> 
>>> 
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> [email protected]
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to