Hi Mathias,

It's OK when we remove the configuration file, the process doens't start.
The problem occurs mainly with our compute nodes (all of them) and we don't use 
GUI and CES.

Ideed, I confirm we don't see performance impact with Linpack running on more 
than hundred nodes, it appears especially when there is a lot of communications 
wich is the case of our applications, our high speed network is based on Intel 
OmniPath Fabric.

We are seeing irregular iteration time every 30 sec. By Enabling 
HyperThreading, the issue is a little bit hidden but still there.

By using less cores per nodes (26 instead of 28), we don't see this behavior as 
if it needs one core for mmsysmon process.

I agree with you, might be good idea to open a PMR...

Please find below the output of mmhealth node show --verbose

Node status:             HEALTHY

Component                Status                   Reasons
-------------------------------------------------------------------
GPFS                     HEALTHY                  -
NETWORK                  HEALTHY                  -
  ib0                      HEALTHY                  -
FILESYSTEM               HEALTHY                  -
  gpfs1                    HEALTHY                  -
  gpfs2                    HEALTHY                  -
DISK                     HEALTHY                  -

Thanks
Farid 

    Le Jeudi 19 janvier 2017 19h21, Simon Thompson (Research Computing - IT 
Services) <[email protected]> a écrit :
 

 On some of our nodes we were regularly seeing procees hung timeouts in dmesg 
from a python process, which I vaguely thought was related to the monitoring 
process (though we have other python bits from openstack running on these 
boxes). These are all running 4.2.2.0 code

Simon
________________________________________
From: [email protected] 
[[email protected]] on behalf of Mathias Dietz 
[[email protected]]
Sent: 19 January 2017 18:07
To: FC; gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Bad performance with GPFS system monitoring 
(mmsysmon) in GPFS 4.2.1.1

Hi Farid,

there is no official way for disabling the system health monitoring because 
other components rely on it (e.g. GUI, CES, Install Toolkit,..)
If you are fine with the consequences you can just delete the 
mmsysmonitor.conf, which will prevent the monitor from starting.

During our testing we did not see a significant performance impact caused by 
the monitoring.
In 4.2.2 some component monitors (e.g. disk) have been further improved to 
reduce polling and use notifications instead.

Nevertheless, I would like to better understand what the issue is.
What kind of workload do you run ?
Do you see spikes in CPU usage every 30 seconds ?
Is it the same on all cluster nodes or just on some of them ?
Could you send us the output of "mmhealth node show -v" to see which monitors 
are active.

It might make sense to open a PMR to get this issue fixed.

Thanks.


Mit freundlichen Grüßen / Kind regards

Mathias Dietz

Spectrum Scale - Release Lead Architect (4.2.X Release)
System Health and Problem Determination Architect
IBM Certified Software Engineer

----------------------------------------------------------------------------------------------------------
IBM Deutschland
Hechtsheimer Str. 2
55131 Mainz
Mobile: +49-15152801035
E-Mail: [email protected]
----------------------------------------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Geschäftsführung: Dirk 
Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294





From:        FC <[email protected]>
To:        "[email protected]" <[email protected]>
Date:        01/19/2017 07:06 AM
Subject:        [gpfsug-discuss] Bad performance with GPFS system monitoring 
(mmsysmon) in GPFS 4.2.1.1
Sent by:        [email protected]
________________________________



Hi all,

We are facing performance issues with some of our applications due to the GPFS 
system monitoring (mmsysmon) on CentOS 7.2.

Bad performances (increase of iteration time) are seen every 30s exactly as the 
occurence frequency of mmsysmon ; the default monitor interval set to 30s in 
/var/mmfs/mmsysmon/mmsysmonitor.conf

Shutting down GPFS with mmshutdown doesnt stop this process, we stopped it with 
the command mmsysmoncontrol and we get a stable iteration time.

What are the impacts of disabling this process except losing access to mmhealth 
commands ?
Do you have an idea of a proper way to disable it for good without doing it in 
rc.local or increasing the monitoring interval in the configuration file ?

Thanks,
Farid _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

   
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to