We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. 

  -- ddj
Dave Johnson

On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson 
<[email protected]> wrote:

>> It might be a problem specific to your system environment or a wrong 
>> configuration therefore please get in contact with IBM support to analyze 
>> the root cause of the high usage.
> 
> I suspect it’s actually a result of frequent IO interrupts causing jitter in 
> conflict with MPI on the shared Intel Omni-Path network, in our case.
> 
> We’ve already tried pursuing support on this through our vendor, DDN, and got 
> no-where. Eventually we were the ones who tried killing mmsysmon, and that 
> fixed our problem.
> 
> The official company line of “we don't see significant CPU consumption by 
> mmsysmon on our test systems” isn’t helping. Do you have a test system with 
> OPA?
> 
> ~jonathon
> 
> 
> On 7/19/17, 7:05 AM, "[email protected] on behalf of 
> Mathias Dietz" <[email protected] on behalf of 
> [email protected]> wrote:
> 
>    thanks for the feedback. 
> 
>    Let me clarify what mmsysmon is doing.
>    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the 
> overall health monitoring and CES failover handling.
>    Even without CES it is an essential part of the system because it monitors 
> the individual components and provides health state information and error 
> events.
> 
>    This information is needed by other Spectrum Scale components (mmhealth 
> command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and 
> therefore disabling mmsysmon will impact them.
> 
> 
>> It’s a huge problem. I don’t understand why it hasn’t been given
> 
>> much credit by dev or support.
> 
>    Over the last couple of month, the development team has put a strong focus 
> on this topic.
> 
>    In order to monitor the health of the individual components, mmsysmon 
> listens for notifications/callback but also has to do some polling.
>    We are trying to reduce the polling overhead constantly and replace 
> polling with notifications when possible.
> 
> 
>    Several improvements have been added to 4.2.3, including the ability to 
> configure the polling frequency to reduce the overhead. (mmhealth config 
> interval)
> 
>    See 
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
>    In addition a new option has been introduced to clock align the monitoring 
> threads in order to reduce CPU jitter.
> 
> 
>    Nevertheless, we don't see significant CPU consumption by mmsysmon on our 
> test systems.
>        
>    It might be a problem specific to your system environment or a wrong 
> configuration therefore please get in contact with IBM support to analyze the 
> root cause of the high usage.
> 
>    Kind regards
> 
>    Mathias Dietz
> 
>    IBM Spectrum Scale - Release Lead Architect and RAS Architect
> 
> 
> 
>    [email protected] wrote on 07/18/2017 07:51:21 PM:
> 
>> From: Jonathon A Anderson <[email protected]>
>> To: gpfsug main discussion list <[email protected]>
>> Date: 07/18/2017 07:51 PM
>> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
>> Sent by: [email protected]
>> 
>> There’s no official way to cleanly disable it so far as I know yet; 
>> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
>> mmsysmonitor.conf.
>> 
>> It’s a huge problem. I don’t understand why it hasn’t been given 
>> much credit by dev or support.
>> 
>> ~jonathon
>> 
>> 
>> On 7/18/17, 11:21 AM, "[email protected] on 
>> behalf of David Johnson" <[email protected] 
>> on behalf of [email protected]> wrote:
>> 
>> 
>> 
>> 
>>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
>>    our diskless compute nodes. I read the earlier query, where it 
>> was answered:
>> 
>> 
>> 
>> 
>>    ces == Cluster Export Services,  mmsysmon.py comes from 
>> mmcesmon. It is used for managing export services of GPFS. If it is 
>> killed,  your nfs/smb etc will be out of work.
>>    Their overhead is small and they are very important. Don't 
>> attempt to kill them.
>> 
>> 
>> 
>> 
>> 
>> 
>>    Our question is this — we don’t run the latest “protocols", our 
>> NFS is CNFS, and our CIFS is clustered CIFS.
>>    I can understand it might be needed with Ganesha, but on every node? 
>> 
>> 
>>    Why in the world would I be getting this daemon running on all 
>> client nodes, when I didn’t install the “protocols" version 
>>    of the distribution?   We have release 4.2.2 at the moment.  How
>> can we disable this?
>> 
>> 
>>    Thanks,
>>     — ddj
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to