Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon.
However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" <[email protected]> wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: [email protected] <[email protected]> cell: 540-557-7851 <%28540%29%20557-7851> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
