Re: [gpfsug-discuss] mmsysmon.py revisited

Jonathon A Anderson Wed, 19 Jul 2017 11:34:56 -0700

OPA behaves _significantly_ differently from Mellanox IB. OPA uses the host CPU 
for packet processing, whereas Mellanox IB uses a discrete asic on the HBA. As 
a result, OPA is much more sensitive to task placement and interrupts, in our 
experience, because the host CPU load competes with the fabric IO processing 
load.


~jonathon


On 7/19/17, 12:12 PM, "[email protected] on behalf of 
[email protected]" <[email protected] on behalf of 
[email protected]> wrote:

    We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. 
    
      -- ddj
    Dave Johnson
    
    On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson 
<[email protected]> wrote:
    
    >> It might be a problem specific to your system environment or a wrong 
configuration therefore please get in contact with IBM support to analyze the 
root cause of the high usage.
    > 
    > I suspect it’s actually a result of frequent IO interrupts causing jitter 
in conflict with MPI on the shared Intel Omni-Path network, in our case.
    > 
    > We’ve already tried pursuing support on this through our vendor, DDN, and 
got no-where. Eventually we were the ones who tried killing mmsysmon, and that 
fixed our problem.
    > 
    > The official company line of “we don't see significant CPU consumption by 
mmsysmon on our test systems” isn’t helping. Do you have a test system with OPA?
    > 
    > ~jonathon
    > 
    > 
    > On 7/19/17, 7:05 AM, "[email protected] on behalf 
of Mathias Dietz" <[email protected] on behalf of 
[email protected]> wrote:
    > 
    >    thanks for the feedback. 
    > 
    >    Let me clarify what mmsysmon is doing.
    >    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the 
overall health monitoring and CES failover handling.
    >    Even without CES it is an essential part of the system because it 
monitors the individual components and provides health state information and 
error events.
    > 
    >    This information is needed by other Spectrum Scale components 
(mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install 
Toolkit,..) and therefore disabling mmsysmon will impact them.
    > 
    > 
    >> It’s a huge problem. I don’t understand why it hasn’t been given
    > 
    >> much credit by dev or support.
    > 
    >    Over the last couple of month, the development team has put a strong 
focus on this topic.
    > 
    >    In order to monitor the health of the individual components, mmsysmon 
listens for notifications/callback but also has to do some polling.
    >    We are trying to reduce the polling overhead constantly and replace 
polling with notifications when possible.
    > 
    > 
    >    Several improvements have been added to 4.2.3, including the ability 
to configure the polling frequency to reduce the overhead. (mmhealth config 
interval)
    > 
    >    See 
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
    >    In addition a new option has been introduced to clock align the 
monitoring threads in order to reduce CPU jitter.
    > 
    > 
    >    Nevertheless, we don't see significant CPU consumption by mmsysmon on 
our test systems.
    >        
    >    It might be a problem specific to your system environment or a wrong 
configuration therefore please get in contact with IBM support to analyze the 
root cause of the high usage.
    > 
    >    Kind regards
    > 
    >    Mathias Dietz
    > 
    >    IBM Spectrum Scale - Release Lead Architect and RAS Architect
    > 
    > 
    > 
    >    [email protected] wrote on 07/18/2017 07:51:21 
PM:
    > 
    >> From: Jonathon A Anderson <[email protected]>
    >> To: gpfsug main discussion list <[email protected]>
    >> Date: 07/18/2017 07:51 PM
    >> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
    >> Sent by: [email protected]
    >> 
    >> There’s no official way to cleanly disable it so far as I know yet; 
    >> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
    >> mmsysmonitor.conf.
    >> 
    >> It’s a huge problem. I don’t understand why it hasn’t been given 
    >> much credit by dev or support.
    >> 
    >> ~jonathon
    >> 
    >> 
    >> On 7/18/17, 11:21 AM, "[email protected] on 
    >> behalf of David Johnson" <[email protected] 
    >> on behalf of [email protected]> wrote:
    >> 
    >> 
    >> 
    >> 
    >>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py 
on
    >>    our diskless compute nodes. I read the earlier query, where it 
    >> was answered:
    >> 
    >> 
    >> 
    >> 
    >>    ces == Cluster Export Services,  mmsysmon.py comes from 
    >> mmcesmon. It is used for managing export services of GPFS. If it is 
    >> killed,  your nfs/smb etc will be out of work.
    >>    Their overhead is small and they are very important. Don't 
    >> attempt to kill them.
    >> 
    >> 
    >> 
    >> 
    >> 
    >> 
    >>    Our question is this — we don’t run the latest “protocols", our 
    >> NFS is CNFS, and our CIFS is clustered CIFS.
    >>    I can understand it might be needed with Ganesha, but on every node? 
    >> 
    >> 
    >>    Why in the world would I be getting this daemon running on all 
    >> client nodes, when I didn’t install the “protocols" version 
    >>    of the distribution?   We have release 4.2.2 at the moment.  How
    >> can we disable this?
    >> 
    >> 
    >>    Thanks,
    >>     — ddj
    >> 
    >> 
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] mmsysmon.py revisited

Reply via email to