OK, after a bit of a delay due to a hectic travel week, here is some more information on my GPFS performance collection. At the bottom, I have links to my server and client zimon config files and a link to my presentation at SSUG Argonne in June. I didn't actually present it but included it in case there was interest.
I used to do a home brew system of period calls to mmpmon to collect data, sticking them into a kafka database. This was a bit cumbersome and when SS 4.2 arrived, I switched over to the built in performance sensors (zimon) to collect the data. IBM has a "as-is" bridge between Grafana and the Zimon collector that works reasonably well - they were supposed to release it but it's been delayed - I will ask about it again and post more information if I get it. My biggest struggle with the zimon configuration is the large memory requirement of the collector with large clusters (many clients, file systems, NSDs). I ended up deploying a 6 collector federation of 16gb per collector for my larger clusters -0 even then I have to limit the number of stats and amount of time I retain it. IBM is aware of the memory issue and I believe they are looking at ways to reduce it. As for what specific metrics I tend to look at: gpfs_fis_bytes_read (written) - aggregated file system read and write stats gpfs_nsdpool_bytes_read (written) - aggregated pool stats, as I have data and metadata split gpfs_fs_tot_disk_wait_rd (wr) - NSD disk wait stats These seem to make the most sense for me to get an overall sense of how things are going. I have a bunch of other more details dashboards for individual file systems and clients that help me get details. The built-in SS GUI is pretty good for small clusters, and is getting some improvements in 4.2.1 that might make me take a closer look at it again. I also look at the RPC waiters stats - no present in 4.2.0 grafana, but I hear are coming in 4.2.1 My SSUG Argonne Presentation (I didn't talk due to time constraints): http://files.gpfsug.org/presentations/2016/anl-june/SSUG_Nuance_PerfTools.pdf Zimon server config file: https://www.dropbox.com/s/gvtfhhqfpsknfnh/ZIMonSensors.cfg.server?dl=0 Zimon client config file: https://www.dropbox.com/s/k5i6rcnaco4vxu6/ZIMonSensors.cfg.client?dl=0 Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: <[email protected]> on behalf of Brian Marshall <[email protected]> Reply-To: gpfsug main discussion list <[email protected]> Date: Wednesday, July 13, 2016 at 8:43 AM To: "[email protected]" <[email protected]> Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
