Re: [gpfsug-discuss] Can't take snapshots while re-striping

Sven Oehme Thu, 18 Oct 2018 12:19:19 -0700

I don't know which min FS version you need to make use of -N, but there is this 
Marc guy watching the mailing list who would know __


Sven


On 10/18/18, 11:50 AM, "Peter Childs" 
<[email protected] on behalf of [email protected]> 
wrote:

    Thanks Sven, that's one of the best answers I've seen and probably closer 
to why we sometimes can't take snapshots under normal circumstances as well.
    
    We're currently running the restripe with "-N " so it only runs on a few 
nodes and does not disturb the work of the cluster, which is why we hadn't 
noticed it slow down the storage too much.
    
    I've also tried to put some qos settings on it too, I always find the qos a 
little bit "trial and error" but 30,000Iops looks to be making the rebalance 
run at about 2/3 iops it was using with no qos limit...... Just out of interest 
which version do I need to be running for "mmchqos -N" to work? I tried it to 
limit a set of nodes and it says not supported by my filesystem version. Manual 
does not look to say.
    
    Even with a very, very small value for qos on maintenance tasks, I still 
can't take snapshots so as Sven says the buffers are getting dirty too quickly.
    
    I have thought before that making snapshot taking more reliable would be 
nice, I'd not really thought it would be possible, I guess its time to write 
another RFE.
    
    Peter Childs
    Research Storage
    ITS Research Infrastructure
    Queen Mary, University of London
    
    ________________________________________
    From: [email protected] 
<[email protected]> on behalf of Sven Oehme 
<[email protected]>
    Sent: Thursday, October 18, 2018 7:09:56 PM
    To: gpfsug main discussion list; [email protected]
    Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping
    
    Peter,
    
    If the 2 operations wouldn't be compatible you should have gotten a 
different message.
    To understand what the message means one needs to understand how the 
snapshot code works.
    When GPFS wants to do a snapshot it goes through multiple phases. It tries 
to first flush all dirty data a first time, then flushes new data a 2nd time 
and then tries to quiesce the filesystem, how to do this is quite complex, so 
let me try to explain.
    
    How much parallelism is used for the 2 sync periods  is controlled by sync 
workers
    
    . sync1WorkerThreads 64
     . sync2WorkerThreads 64
     . syncBackgroundThreads 64
    . syncWorkerThreads 64
    
    and if my memory serves me correct the sync1 number is for the first flush, 
the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. 
crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if 
I state something wrong I mixed them up before ) :
    
    when data is flushed by background sync is triggered by the OS :
    
    root@dgx-1-01:~# sysctl -a |grep -i vm.dirty
    vm.dirty_background_bytes = 0
    vm.dirty_background_ratio = 10
    vm.dirty_bytes = 0
    vm.dirty_expire_centisecs = 3000
    vm.dirty_ratio = 20
    vm.dirty_writeback_centisecs = 500.  <--- this is 5 seconds
    
    as well as GPFS settings :
    
      syncInterval 5
      syncIntervalStrict 0
    
    here both are set to 5 seconds, so every 5 seconds there is a periodic 
background flush happening .
    
    why explain all this, because its very easy for a thread that does buffered 
i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into 
memory so making stuff dirty is very easy. The number of threads described 
above need to clean all this stuff, means stabilizing it onto media and here is 
where it gets complicated. You already run rebalance, which puts a lot of work 
on the disk, on top I assume you don't have a idle filesystem  , people make 
stuff dirty and the threads above compete flushing things , so it’s a battle 
they can't really win unless you have very fast storage or at least very fast 
and large caches in the storage, so the 64 threads in the example above can 
clean stuff faster than new data gets made dirty.
    
    So your choices are  :
    1. reduce workerthreads, so stuff gets less dirty.
    2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you 
can use -I while running) this will slow all write operations down on your 
system as all writes are now done synchronous, but because of that they can't 
make anything dirty, so the flushers actually don't have to do any work.
    
    While back at IBM I proposed to change the code to switch into O_SYNC mode 
dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes 
would be done synchronous to not have the possibility to make things dirty so 
the quiesce actually doesn't get delayed and as soon as the quiesce happened 
remove the temporary enforced stable flag, but that proposal never got anywhere 
as no customer pushed for it. Maybe that would be worth a RFE __
    
    
    Btw. I described some of the parameters in more detail here --> 
http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf
    Some of that is outdated by now, but probably still the best summary 
presentation out there.
    
    Sven
    
    On 10/18/18, 8:32 AM, "Peter Childs" 
<[email protected] on behalf of [email protected]> 
wrote:
    
        We've just added 9 raid volumes to our main storage, (5 Raid6 arrays
        for data and 4 Raid1 arrays for metadata)
    
        We are now attempting to rebalance and our data around all the volumes.
    
        We started with the meta-data doing a "mmrestripe -r" as we'd changed
        the failure groups to on our meta-data disks and wanted to ensure we
        had all our metadata on known good ssd. No issues, here we could take
        snapshots and I even tested it. (New SSD on new failure group and move
        all old SSD to the same failure group)
    
        We're now doing a "mmrestripe -b" to rebalance the data accross all 21
        Volumes however when we attempt to take a snapshot, as we do every
        night at 11pm it fails with
    
        sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test
        Flushing dirty data for snapshot :test...
        Quiescing all file system operations.
        Unable to quiesce all nodes; some processes are busy or holding
        required resources.
        mmcrsnapshot: Command failed. Examine previous error messages to
        determine cause.
    
        Are you meant to be able to take snapshots while re-striping or not?
    
        I know a rebalance of the data is probably unnecessary, but we'd like
        to get the best possible speed out of the system, and we also kind of
        like balance.
    
        Thanks
    
    
        --
        Peter Childs
        ITS Research Storage
        Queen Mary, University of London
    
        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Can't take snapshots while re-striping

Reply via email to