You may want to get an mmtrace,  but I suspect that the disk IOs are slow.     
The iohist is showing the time from when the start IO was issued until it was 
finished.    Of course if you have disk IOs taking 10x too long then other IOs 
are going to queue up behind it.    If there are more IOs than there are NSD 
server threads then there are going to be IOs that are queued and waiting for a 
thread. 

Jim
 

    On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L 
<[email protected]> wrote:  
 
 Hi All,

First off, my apologies for the delay in responding back to the list … we’ve 
actually been working our tails off on this one trying to collect as much data 
as we can on what is a very weird issue.  While I’m responding to Aaron’s 
e-mail, I’m going to try to address the questions raised in all the responses.

Steve - this all started last week.  You’re correct about our mixed workload.  
There have been no new workloads that I am aware of.

Stephen - no, this is not an ESS.  We are running GPFS 4.2.3-8.

Aaron - no, this is not on a DDN, either.

The hardware setup is a vanilla 8 GB FC SAN.  Commodity hardware for the 
servers and storage.  We have two SAN “stacks” and all NSD servers and storage 
are connected to both stacks.  Linux multipathing handles path failures.  10 
GbE out to the network.

We first were alerted to this problem by one of our monitoring scripts which 
was designed to alert us to abnormally high I/O times, which, as I mentioned 
previously, in our environment has usually been caused by cache battery backup 
failures in the storage array controllers (but _not_ this time).  So I’m 
getting e-mails that in part read:

Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms.
Disk eon34Ensd on nsd4 has a service time of 3146.715 ms.

The “34” tells me what storage array and the “C” or “E” tells me what LUN on 
that storage array.  As I’ve mentioned, those two LUNs are by far and away my 
most frequent problem children, but here’s another report from today as well:

Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms.
Disk eon28Ansd on nsd7 has a service time of 1154.002 ms.
Disk eon31Ansd on nsd3 has a service time of 1068.987 ms.
Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms.

NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8.

Based on Fred’s excellent advice, we took a closer look at the “mmfsadm dump 
nsd” output.  We wrote a Python script to pull out what we think is the most 
pertinent information:

nsd1
29 SMALL queues, 50 requests pending, 3741 was the highest number of requests 
pending.
    348 threads started, 1 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 0 requests pending, 5694 was the highest number of requests 
pending.
    348 threads started, 124 threads active, 348 was the highest number of 
threads active.
nsd2
29 SMALL queues, 0 requests pending, 1246 was the highest number of requests 
pending.
    348 threads started, 13 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 470 requests pending, 2404 was the highest number of requests 
pending.
    348 threads started, 340 threads active, 348 was the highest number of 
threads active.
nsd3
29 SMALL queues, 108 requests pending, 1796 was the highest number of requests 
pending.
    348 threads started, 0 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 35 requests pending, 3331 was the highest number of requests 
pending.
    348 threads started, 4 threads active, 348 was the highest number of 
threads active.
nsd4
42 SMALL queues, 0 requests pending, 1529 was the highest number of requests 
pending.
    504 threads started, 8 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 0 requests pending, 637 was the highest number of requests 
pending.
    504 threads started, 211 threads active, 504 was the highest number of 
threads active.
nsd5
42 SMALL queues, 182 requests pending, 2798 was the highest number of requests 
pending.
    504 threads started, 6 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 407 requests pending, 4416 was the highest number of requests 
pending.
    504 threads started, 8 threads active, 504 was the highest number of 
threads active.
nsd6
42 SMALL queues, 0 requests pending, 1630 was the highest number of requests 
pending.
    504 threads started, 0 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 0 requests pending, 148 was the highest number of requests 
pending.
    504 threads started, 9 threads active, 504 was the highest number of 
threads active.
nsd7
42 SMALL queues, 43 requests pending, 2179 was the highest number of requests 
pending.
    504 threads started, 1 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 0 requests pending, 2551 was the highest number of requests 
pending.
    504 threads started, 13 threads active, 504 was the highest number of 
threads active.
nsd8
42 SMALL queues, 0 requests pending, 1014 was the highest number of requests 
pending.
    504 threads started, 4 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 0 requests pending, 3371 was the highest number of requests 
pending.
    504 threads started, 89 threads active, 504 was the highest number of 
threads active.

Note that we see more “load” on the LARGE queue side of things and that nsd2 
and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently 
in our alerts) are the heaviest loaded.

One other thing we have noted is that our home grown RRDtool monitoring plots 
that are based on netstat, iostat, vmstat, etc. also show an oddity.  Most of 
our LUNs show up as 33 - 68% utilized … but all the LUNs on eon34 (there are 4 
in total) show up as 93 - 97% utilized.  And another oddity there is that 
eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E 
show up waaaayyyyyyy more than anything else … the difference between them is 
that A and B are on the storage array itself and C and E are on JBOD’s 
SAS-attached to the storage array (and yes, we’ve actually checked and reseated 
those connections).

Another reason why I could not respond earlier today is that one of the things 
which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 
GB respectively to 64 GB each … and I then upped the pagepool on those two 
boxes to 40 GB.  That has not made a difference.  How can I determine how much 
of the pagepool is actually being used, BTW?  A quick Google search didn’t help 
me.

So we’re trying to figure out if we have storage hardware issues causing GPFS 
issues or GPFS issues causing storage slowdowns.  The fact that I see slowdowns 
most often on one storage array points in one direction, while the fact that at 
times I see even worse slowdowns on multiple other arrays points the other way. 
 The fact that some NSD servers show better stats than others in the analysis 
of the “mmfsadm dump nsd” output tells me … well, I don’t know what it tells me.

I think that’s all for now.  If you have read this entire very long e-mail, 
first off, thank you!  If you’ve read it and have ideas for where I should go 
from here, T-H-A-N-K Y-O-U!

Kevin

> On Jul 4, 2018, at 7:34 AM, Aaron Knister <[email protected]> wrote:
> 
> Hi Kevin,
> 
> Just going out on a very weird limb here...but you're not by chance seeing 
> this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 
> 14K, etc.) We just started seeing some very weird and high latency on some of 
> our SFA12ks (that have otherwise been solid both in terms of stability and 
> performance) but only on certain volumes and the affected volumes change. 
> It's very bizzarre and we've been working closely with DDN to track down the 
> root cause but we've not yet found a smoking gun. The timing and description 
> of your problem sounded eerily similar to what we're seeing so I'd thought 
> I'd ask.
> 
> -Aaron
> 
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> 
> 
> On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote:
> 
>> Hi all,
>> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of 
>> our NSDs as reported by “mmdiag —iohist" and are struggling to understand 
>> why.  One of the
>> confusing things is that, while certain NSDs tend to show the problem more 
>> than others, the problem is not consistent … i.e. the problem tends to move 
>> around from
>> NSD to NSD (and storage array to storage array) whenever we check … which is 
>> sometimes just a few minutes apart.
>> In the past when I have seen “mmdiag —iohist” report high wait times like 
>> this it has *always* been hardware related.  In our environment, the most 
>> common cause has
>> been a battery backup unit on a storage array controller going bad and the 
>> storage array switching to write straight to disk.  But that’s *not* 
>> happening this time.
>> Is there anything within GPFS / outside of a hardware issue that I should be 
>> looking for??  Thanks!
>> —
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> [email protected] - (615)875-9633
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
  
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to