Re: [gpfsug-discuss] mmsysmon.py revisited

Jonathon A Anderson Wed, 19 Jul 2017 10:52:43 -0700

> It might be a problem specific to your system environment or a wrong 
> configuration therefore please get in contact with IBM support to analyze the 
> root cause of the high usage.


I suspect it’s actually a result of frequent IO interrupts causing jitter in 
conflict with MPI on the shared Intel Omni-Path network, in our case.

We’ve already tried pursuing support on this through our vendor, DDN, and got 
no-where. Eventually we were the ones who tried killing mmsysmon, and that 
fixed our problem.

The official company line of “we don't see significant CPU consumption by 
mmsysmon on our test systems” isn’t helping. Do you have a test system with OPA?

~jonathon


On 7/19/17, 7:05 AM, "[email protected] on behalf of 
Mathias Dietz" <[email protected] on behalf of 
[email protected]> wrote:

    thanks for the feedback. 
    
    Let me clarify what mmsysmon is doing.
    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall 
health monitoring and CES failover handling.
    Even without CES it is an essential part of the system because it monitors 
the individual components and provides health state information and error 
events.
    
    This information is needed by other Spectrum Scale components (mmhealth 
command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and 
therefore disabling mmsysmon will impact them.
    
    
    > It’s a huge problem. I don’t understand why it hasn’t been given
    
    > much credit by dev or support.
    
    Over the last couple of month, the development team has put a strong focus 
on this topic.
    
    In order to monitor the health of the individual components, mmsysmon 
listens for notifications/callback but also has to do some polling.
    We are trying to reduce the polling overhead constantly and replace polling 
with notifications when possible.
    
    
    Several improvements have been added to 4.2.3, including the ability to 
configure the polling frequency to reduce the overhead. (mmhealth config 
interval)
    
    See 
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
    In addition a new option has been introduced to clock align the monitoring 
threads in order to reduce CPU jitter.
    
    
    Nevertheless, we don't see significant CPU consumption by mmsysmon on our 
test systems.
        
    It might be a problem specific to your system environment or a wrong 
configuration therefore please get in contact with IBM support to analyze the 
root cause of the high usage.
    
    Kind regards
    
    Mathias Dietz
    
    IBM Spectrum Scale - Release Lead Architect and RAS Architect
    
    
    
    [email protected] wrote on 07/18/2017 07:51:21 PM:
    
    > From: Jonathon A Anderson <[email protected]>
    > To: gpfsug main discussion list <[email protected]>
    > Date: 07/18/2017 07:51 PM
    > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
    > Sent by: [email protected]
    > 
    > There’s no official way to cleanly disable it so far as I know yet; 
    > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
    > mmsysmonitor.conf.
    > 
    > It’s a huge problem. I don’t understand why it hasn’t been given 
    > much credit by dev or support.
    > 
    > ~jonathon
    > 
    > 
    > On 7/18/17, 11:21 AM, "[email protected] on 
    > behalf of David Johnson" <[email protected] 
    > on behalf of [email protected]> wrote:
    > 
    >     
    >     
    >     
    >     We also noticed a fair amount of CPU time accumulated by mmsysmon.py 
on
    >     our diskless compute nodes. I read the earlier query, where it 
    > was answered:
    >     
    >     
    >     
    >     
    >     ces == Cluster Export Services,  mmsysmon.py comes from 
    > mmcesmon. It is used for managing export services of GPFS. If it is 
    > killed,  your nfs/smb etc will be out of work.
    >     Their overhead is small and they are very important. Don't 
    > attempt to kill them.
    >     
    >     
    >     
    >     
    >     
    >     
    >     Our question is this — we don’t run the latest “protocols", our 
    > NFS is CNFS, and our CIFS is clustered CIFS.
    >     I can understand it might be needed with Ganesha, but on every node? 
    >     
    >     
    >     Why in the world would I be getting this daemon running on all 
    > client nodes, when I didn’t install the “protocols" version 
    >     of the distribution?   We have release 4.2.2 at the moment.  How
    > can we disable this?
    >     
    >     
    >     Thanks,
    >      — ddj
    >     
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] mmsysmon.py revisited

Reply via email to