There is a somewhat hidden danger with eviction: You can get silent data loss.  
The simplest example is buffered (ie, any that aren't direct I/O) writes - 
Lustre reports completion (ie your write() syscall completes) once the data is 
in the page cache on the client (like any modern file system, including local 
ones - you can get silent data loss on EXT4, XFS, ZFS, etc, if your disk 
becomes unavailable before data is written out of the page cache).

So if that client with pending dirty data is evicted from the OST the data is 
destined for - which is essentially what abort recovery does - that data is 
lost, and the application doesn't get an error (because the write() call has 
already completed).

A message is printed to the console on the client in this case, but you have to 
know to look for it.  The application will run to completion, but you may still 
experience data loss, and not know it.  It's just something to keep in mind - 
applications that run to completion despite evictions may not have completed 
*successfully*.

- Patrick

On 10/19/18, 11:42 AM, "lustre-discuss on behalf of Mohr Jr, Richard Frank 
(Rick Mohr)" <[email protected] on behalf of 
[email protected]> wrote:

    
    > On Oct 19, 2018, at 10:42 AM, Marion Hakanson <[email protected]> wrote:
    > 
    > Thanks for the feedback.  You're both confirming what we've learned so 
far, that we had to unmount all the clients (which required rebooting most of 
them), then reboot all the storage servers, to get things unstuck until the 
problem recurred.
    > 
    > I tried abort_recovery on the clients last night, before rebooting the 
MDS, but that did not help.  Could well be I'm not using it right:
    
    Aborting recovery is a server-side action, not something that is done on 
the client.  As mentioned by Peter, you can abort recovery on a single target 
after it is mounted by using “lctl —device <DEV> abort_recover”.  But you can 
also just skip over the recovery step when the target is mounted by adding the 
“-o abort_recov” option to the mount command.  For example, 
    
    mount -t lustre -o abort_recov /dev/my/mdt /mnt/lustre/mdt0
    
    And similarly for OSTs.  So you should be able to just unmount your MDT/OST 
on the running file system and then remount with the abort_recov option.  From 
a client perspective, the lustre client will get evicted but should 
automatically reconnect.   
    
    Some applications can ride through a client eviction without causing 
issues, some cannot.  I think it depends largely on how the application does 
its IO and if there is any IO in flight when the eviction occurs.  I have had 
to do this a few times on a running cluster, and in my experience, we have had 
good luck with most of the applications continuing without issues.  Sometimes 
there are a few jobs that abort, but overall this is better than having to stop 
all jobs and remount lustre on all the compute nodes.
    
    --
    Rick Mohr
    Senior HPC System Administrator
    National Institute for Computational Sciences
    http://www.nics.tennessee.edu
    
    _______________________________________________
    lustre-discuss mailing list
    [email protected]
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to