Thanks everyone! I also have a PMR open for this, so hopefully the RFE gets some traction.

On 8/18/16 11:14 AM, McPheeters, Gordon wrote:
Got my vote -  thanks Robert.


Gordon McPheeters
ALCF Storage
(630) 252-6430
[email protected] <mailto:[email protected]>



On Aug 18, 2016, at 10:00 AM, Bryan Banister
<[email protected] <mailto:[email protected]>> wrote:

Great stuff… I added my vote,
-Bryan

*From:* [email protected]
<mailto:[email protected]> 
[mailto:[email protected]] *On
Behalf Of *Oesterlin, Robert
*Sent:* Thursday, August 18, 2016 9:47 AM
*To:* gpfsug main discussion list
*Subject:* Re: [gpfsug-discuss] Monitor NSD server queue?

Done.

Notification generated at: 18 Aug 2016, 10:46 AM Eastern Time (ET)

ID:                                                93260
Headline:                                    Give sysadmin insight
into the inner workings of the NSD server machinery, in particular the
queue dynamics
Submitted on:                            18 Aug 2016, 10:46 AM Eastern
Time (ET)
Brand:                                          Servers and Systems
Software
Product:                                      Spectrum Scale (formerly
known as GPFS) - Public RFEs

Link:
 http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=93260


Bob Oesterlin
Sr Storage Engineer, Nuance HPC Grid
507-269-0413


*From: *<[email protected]
<mailto:[email protected]>> on behalf of Yuri L
Volobuev <[email protected] <mailto:[email protected]>>
*Reply-To: *gpfsug main discussion list
<[email protected]
<mailto:[email protected]>>
*Date: *Wednesday, August 17, 2016 at 3:34 PM
*To: *gpfsug main discussion list <[email protected]
<mailto:[email protected]>>
*Subject: *[EXTERNAL] Re: [gpfsug-discuss] Monitor NSD server queue?


Unfortunately, at the moment there's no safe mechanism to show the
usage statistics for different NSD queues. "mmfsadm saferdump nsd" as
implemented doesn't acquire locks when parsing internal data
structures. Now, NSD data structures are fairly static, as much things
go, so the risk of following a stale pointer and hitting a segfault
isn't particularly significant. I don't think I remember ever seeing
mmfsd crash with NSD dump code on the stack. That said, this isn't
code that's tested and known to be safe for production use. I haven't
seen a case myself where an mmfsd thread gets stuck running this dump
command, either, but Bob has. If that condition ever reoccurs, I'd be
interested in seeing debug data.

I agree that there's value in giving a sysadmin insight into the inner
workings of the NSD server machinery, in particular the queue
dynamics. mmdiag should be enhanced to allow this. That'd be a very
reasonable (and doable) RFE.

yuri

<image001.gif>"Oesterlin, Robert" ---08/17/2016 04:45:30 AM---Hi Aaron
You did a perfect job of explaining a situation I've run into time
after time - high latenc

From: "Oesterlin, Robert" <[email protected]
<mailto:[email protected]>>
To: gpfsug main discussion list <[email protected]
<mailto:[email protected]>>,
Date: 08/17/2016 04:45 AM
Subject: Re: [gpfsug-discuss] Monitor NSD server queue?
Sent by: [email protected]
<mailto:[email protected]>

------------------------------------------------------------------------




Hi Aaron

You did a perfect job of explaining a situation I've run into time
after time - high latency on the disk subsystem causing a backup in
the NSD queues. I was doing what you suggested not to do - "mmfsadm
saferdump nsd' and looking at the queues. In my case 'mmfsadm
saferdump" would usually work or hang, rather than kill mmfsd. But -
the hang usually resulted it a tied up thread in mmfsd, so that's no
good either.

I wish I had better news - this is the only way I've found to get
visibility to these queues. IBM hasn't seen fit to gives us a way to
safely look at these. I personally think it's a bug that we can't
safely dump these structures, as they give insight as to what's
actually going on inside the NSD server.

Yuri, Sven - thoughts?


Bob Oesterlin
Sr Storage Engineer, Nuance HPC Grid



*From: *<[email protected]
<mailto:[email protected]>> on behalf of
"Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]"
<[email protected] <mailto:[email protected]>>*
Reply-To: *gpfsug main discussion list
<[email protected]
<mailto:[email protected]>>*
Date: *Tuesday, August 16, 2016 at 8:46 PM*
To: *gpfsug main discussion list <[email protected]
<mailto:[email protected]>>*
Subject: *[EXTERNAL] [gpfsug-discuss] Monitor NSD server queue?

Hi Everyone,

We ran into a rather interesting situation over the past week. We had
a job that was pounding the ever loving crap out of one of our
filesystems (called dnb02) doing about 15GB/s of reads. We had other
jobs experience a slowdown on a different filesystem (called dnb41)
that uses entirely separate backend storage. What I can't figure out
is why this other filesystem was affected. I've checked IB bandwidth
and congestion, Fibre channel bandwidth and errors, Ethernet bandwidth
congestion, looked at the mmpmon nsd_ds counters (including disk
request wait time), and checked out the disk iowait values from
collectl. I simply can't account for the slowdown on the other
filesystem. The only thing I can think of is the high latency on
dnb02's NSDs caused the mmfsd NSD queues to back up.

Here's my question-- how can I monitor the state of th NSD queues? I
can't find anything in mmdiag. An mmfsadm saferdump NSD shows me the
queues and their status. I'm just not sure calling saferdump NSD every
10 seconds to monitor this data is going to end well. I've seen
saferdump NSD cause mmfsd to die and that's from a task we only run
every 6 hours that calls saferdump NSD.

Any thoughts/ideas here would be great.

Thanks!

-Aaron_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwMFAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=D8iCz340ioiUrtGkAFdKjfgfitPkpOr1nRkkxTRCBn0&s=ncd-C59bavCSUTkgYH1vH4ewOM12Hajhy-KhFtKZK68&e=>


------------------------------------------------------------------------

Note: This email is for the confidential use of the named addressee(s)
only and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you are hereby
notified that any review, dissemination or copying of this email is
strictly prohibited, and to please notify the sender immediately and
destroy this email and any attachments. Email transmission cannot be
guaranteed to be secure or error-free. The Company, therefore, does
not make any guarantees as to the completeness or accuracy of this
email or any attachments. This email is for informational purposes
only and does not constitute a recommendation, offer, request or
solicitation of any kind to buy, sell, subscribe, redeem or perform
any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to