Re: [gpfsug-discuss] strange waiters + filesystem deadlock

Aaron Knister Fri, 24 Mar 2017 11:10:13 -0700

I believe it was created with -n 5000. Here's the exact command that wasused:

/usr/lpp/mmfs/bin/mmcrfs dnb03 -F ./disc_mmcrnsd_dnb03.lst -T/gpfsm/dnb03 -j cluster -B 1M -n 5000 -N 20M -r1 -R2 -m2 -M2 -A no -Qyes -v yes -i 512 --metadata-block-size=256K -L 8388608


-Aaron

On 3/24/17 2:05 PM, Sven Oehme wrote:

was this filesystem creates with -n 5000 ? or was that changed later
with mmchfs ?
please send the mmlsconfig/mmlscluster output to me at oeh...@us.ibm.com
<mailto:oeh...@us.ibm.com>



On Fri, Mar 24, 2017 at 10:58 AM Aaron Knister <aaron.s.knis...@nasa.gov
<mailto:aaron.s.knis...@nasa.gov>> wrote:

    I feel a little awkward about posting wlists of IP's and hostnames on
    the mailing list (even though they're all internal) but I'm happy to
    send to you directly. I've attached both an lsfs and an mmdf output of
    the fs in question here since that may be useful for others to see. Just
    a note about disk d23_02_021-- it's been evacuated for several weeks now
    due to a hardware issue in the disk enclosure.

    The fs is rather full percentage wise (93%) but in terms of capacity
    there's a good amount free. 93% full of a 7PB filesystem still leaves
    551T. Metadata, as you'll see, is 31% free (roughly 800GB).

    The fs has 40M inodes allocated and 12M free.

    -Aaron

    On 3/24/17 1:41 PM, Sven Oehme wrote:
    > ok, that seems a different problem then i was thinking.
    > can you send output of mmlscluster, mmlsconfig, mmlsfs all ?
    > also are you getting close to fill grade on inodes or capacity on any of
    > the filesystems ?
    >
    > sven
    >
    >
    > On Fri, Mar 24, 2017 at 10:34 AM Aaron Knister <aaron.s.knis...@nasa.gov 
<mailto:aaron.s.knis...@nasa.gov>
    > <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>>> 
wrote:
    >
    >     Here's the screenshot from the other node with the high cpu 
utilization.
    >
    >     On 3/24/17 1:32 PM, Aaron Knister wrote:
    >     > heh, yep we're on sles :)
    >     >
    >     > here's a screenshot of the fs manager from the deadlocked 
filesystem. I
    >     > don't think there's an nsd server or manager node that's running 
full
    >     > throttle across all cpus. There is one that's got relatively high 
CPU
    >     > utilization though (300-400%). I'll send a screenshot of it in a 
sec.
    >     >
    >     > no zimon yet but we do have other tools to see cpu utilization.
    >     >
    >     > -Aaron
    >     >
    >     > On 3/24/17 1:22 PM, Sven Oehme wrote:
    >     >> you must be on sles as this segfaults only on sles to my knowledge 
:-)
    >     >>
    >     >> i am looking for a NSD or manager node in your cluster that runs 
at 100%
    >     >> cpu usage.
    >     >>
    >     >> do you have zimon deployed to look at cpu utilization across your 
nodes ?
    >     >>
    >     >> sven
    >     >>
    >     >>
    >     >>
    >     >> On Fri, Mar 24, 2017 at 10:08 AM Aaron Knister <aaron.s.knis...@nasa.gov 
<mailto:aaron.s.knis...@nasa.gov>
    <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>>
    >     >> <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>
    <mailto:aaron.s.knis...@nasa.gov
    <mailto:aaron.s.knis...@nasa.gov>>>> wrote:
    >     >>
    >     >>     Hi Sven,
    >     >>
    >     >>     Which NSD server should I run top on, the fs manager? If so the
    >     >> CPU load
    >     >>     is about 155%. I'm working on perf top but not off to a great
    >     >> start...
    >     >>
    >     >>     # perf top
    >     >>         PerfTop:    1095 irqs/sec  kernel:61.9%  exact:  0.0% 
[1000Hz
    >     >>     cycles],  (all, 28 CPUs)
    >     >>
    >     >> 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    >     >>
    >     >>     Segmentation fault
    >     >>
    >     >>     -Aaron
    >     >>
    >     >>     On 3/24/17 1:04 PM, Sven Oehme wrote:
    >     >>     > while this is happening  run top and see if there is very 
high cpu
    >     >>     > utilization at this time on the NSD Server.
    >     >>     >
    >     >>     > if there is , run perf top (you might need to install perf
    >     >> command) and
    >     >>     > see if the top cpu contender is a spinlock . if so send a
    >     >> screenshot of
    >     >>     > perf top as i may know what that is and how to fix.
    >     >>     >
    >     >>     > sven
    >     >>     >
    >     >>     >
    >     >>     > On Fri, Mar 24, 2017 at 9:43 AM Aaron Knister
    >     >> <aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>
    <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>>
    >     <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>
    <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>>>
    >     >>     > <mailto:aaron.s.knis...@nasa.gov 
<mailto:aaron.s.knis...@nasa.gov>
    <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>>
    >     >> <mailto:aaron.s.knis...@nasa.gov <mailto:aaron.s.knis...@nasa.gov>
    <mailto:aaron.s.knis...@nasa.gov
    <mailto:aaron.s.knis...@nasa.gov>>>>> wrote:
    >     >>     >
    >     >>     >     Since yesterday morning we've noticed some deadlocks on 
one
    >     >> of our
    >     >>     >     filesystems that seem to be triggered by writing to it. 
The
    >     >> waiters on
    >     >>     >     the clients look like this:
    >     >>     >
    >     >>     >     0x19450B0 (   6730) waiting 2063.294589599 seconds,
    >     >> SyncHandlerThread:
    >     >>     >     on ThCond 0x1802585CB10 (0xFFFFC9002585CB10)
    >     >> (InodeFlushCondVar), reason
    >     >>     >     'waiting for the flush flag to commit metadata'
    >     >>     >     0x7FFFDA65E200 (  22850) waiting 0.000246257 seconds,
    >     >>     >     AllocReduceHelperThread: on ThCond 0x7FFFDAC7FE28
    >     >> (0x7FFFDAC7FE28)
    >     >>     >     (MsgRecordCondvar), reason 'RPC wait' for
    >     >> allocMsgTypeRelinquishRegion
    >     >>     >     on node 10.1.52.33 <c0n3271>
    >     >>     >     0x197EE70 (   6776) waiting 0.000198354 seconds,
    >     >>     >     FileBlockWriteFetchHandlerThread: on ThCond 
0x7FFFF00CD598
    >     >>     >     (0x7FFFF00CD598) (MsgRecordCondvar), reason 'RPC wait' 
for
    >     >>     >     allocMsgTypeRequestRegion on node 10.1.52.33 <c0n3271>
    >     >>     >
    >     >>     >     (10.1.52.33/c0n3271 <http://10.1.52.33/c0n3271>
    <http://10.1.52.33/c0n3271>
    >     <http://10.1.52.33/c0n3271>
    >     >>     <http://10.1.52.33/c0n3271> is the fs manager
    >     >>     >     for the filesystem in question)
    >     >>     >
    >     >>     >     there's a single process running on this node writing to 
the
    >     >> filesystem
    >     >>     >     in question (well, trying to write, it's been blocked 
doing
    >     >> nothing for
    >     >>     >     half an hour now). There are ~10 other client nodes in 
this
    >     >> situation
    >     >>     >     right now. We had many more last night before the problem
    >     >> seemed to
    >     >>     >     disappear in the early hours of the morning and now its 
back.
    >     >>     >
    >     >>     >     Waiters on the fs manager look like this. While the
    >     >> individual waiter is
    >     >>     >     short it's a near constant stream:
    >     >>     >
    >     >>     >     0x7FFF60003540 (   8269) waiting 0.001151588 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
    >     >> (0xFFFFC9002163A2E0)
    >     >>     >     (AllocManagerMutex)
    >     >>     >     0x7FFF601C8860 (  20606) waiting 0.001115712 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
    >     >>     >     (0xFFFFC9002163A2E0) (AllocManagerMutex)
    >     >>     >     0x7FFF91C10080 (  14723) waiting 0.000959649 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
    >     >> (0xFFFFC9002163A2E0)
    >     >>     >     (AllocManagerMutex)
    >     >>     >     0x7FFFB03C2910 (  12636) waiting 0.000769611 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
    >     >> (0xFFFFC9002163A2E0)
    >     >>     >     (AllocManagerMutex)
    >     >>     >     0x7FFF8C092850 (  18215) waiting 0.000682275 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
    >     >>     >     (0xFFFFC9002163A2E0) (AllocManagerMutex)
    >     >>     >     0x7FFF9423F730 (  12652) waiting 0.000641915 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
    >     >> (0xFFFFC9002163A2E0)
    >     >>     >     (AllocManagerMutex)
    >     >>     >     0x7FFF9422D770 (  12625) waiting 0.000494256 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
    >     >> (0xFFFFC9002163A2E0)
    >     >>     >     (AllocManagerMutex)
    >     >>     >     0x7FFF9423E310 (  12651) waiting 0.000437760 seconds, Msg
    >     >> handler
    >     >>     >     allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
    >     >>     >     (0xFFFFC9002163A2E0) (AllocManagerMutex)
    >     >>     >
    >     >>     >     I don't know if this data point is useful but both 
yesterday
    >     >> and today
    >     >>     >     the metadata NSDs for this filesystem have had a constant
    >     >> aggregate
    >     >>     >     stream of 25MB/s 4kop/s reads during each episode (very 
low
    >     >> latency
    >     >>     >     though so I don't believe the storage is a bottleneck 
here).
    >     >> Writes are
    >     >>     >     only a few hundred ops and didn't strike me as odd.
    >     >>     >
    >     >>     >     I have a PMR open for this but I'm curious if folks have
    >     >> seen this in
    >     >>     >     the wild and what it might mean.
    >     >>     >
    >     >>     >     -Aaron
    >     >>     >
    >     >>     >     --
    >     >>     >     Aaron Knister
    >     >>     >     NASA Center for Climate Simulation (Code 606.2)
    >     >>     >     Goddard Space Flight Center
    >     >>     >     (301) 286-2776 <tel:(301)%20286-2776> 
<tel:(301)%20286-2776>
    <tel:(301)%20286-2776>
    >     <tel:(301)%20286-2776>
    >     >>     >     _______________________________________________
    >     >>     >     gpfsug-discuss mailing list
    >     >>     >     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> 
<http://spectrumscale.org>
    >     >> <http://spectrumscale.org> <http://spectrumscale.org>
    >     >>     >     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >     >>     >
    >     >>     >
    >     >>     >
    >     >>     > _______________________________________________
    >     >>     > gpfsug-discuss mailing list
    >     >>     > gpfsug-discuss at spectrumscale.org 
<http://spectrumscale.org>
    <http://spectrumscale.org> <http://spectrumscale.org>
    >     >>     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >     >>     >
    >     >>
    >     >>     --
    >     >>     Aaron Knister
    >     >>     NASA Center for Climate Simulation (Code 606.2)
    >     >>     Goddard Space Flight Center
    >     >>     (301) 286-2776 <tel:(301)%20286-2776> <tel:(301)%20286-2776>
    <tel:(301)%20286-2776>
    >     >>     _______________________________________________
    >     >>     gpfsug-discuss mailing list
    >     >>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
    <http://spectrumscale.org> <http://spectrumscale.org>
    >     >>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >     >>
    >     >>
    >     >>
    >     >> _______________________________________________
    >     >> gpfsug-discuss mailing list
    >     >> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> 
<http://spectrumscale.org>
    >     >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >     >>
    >     >
    >     >
    >     >
    >     > _______________________________________________
    >     > gpfsug-discuss mailing list
    >     > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> 
<http://spectrumscale.org>
    >     > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >     >
    >
    >     --
    >     Aaron Knister
    >     NASA Center for Climate Simulation (Code 606.2)
    >     Goddard Space Flight Center
    >     (301) 286-2776 <tel:(301)%20286-2776> <tel:(301)%20286-2776>
    >     _______________________________________________
    >     gpfsug-discuss mailing list
    >     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> 
<http://spectrumscale.org>
    >     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    >
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >

    --
    Aaron Knister
    NASA Center for Climate Simulation (Code 606.2)
    Goddard Space Flight Center
    (301) 286-2776 <tel:(301)%20286-2776>
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] strange waiters + filesystem deadlock

Reply via email to