Another data point for Keith/Kristy,

I’ve been using Zimon for about 18 months now, and I’ll have to admit it’s been 
less than robust for long-term data. The biggest issue I’ve run into is the 
stability of the collector process. I have it crash on a fairly regular basis, 
most due to memory usage. This results in data loss You can configure it in a 
highly-available mode that should mitigate this to some degree. However, I 
don’t think IBM has published any details on how reliable the data collection 
process is.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <[email protected]> on behalf of Kristy 
Kallback-Rose <[email protected]>
Reply-To: gpfsug main discussion list <[email protected]>
Date: Sunday, September 24, 2017 at 2:29 PM
To: gpfsug main discussion list <[email protected]>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Experience with zimon database 
stability, and best practices for backup?

Hi Keith,

  We have barely begun with Zimon and have not (knock, knock) run up against 
any loss or corruption issues with Zimon.

  However, getting data out of Zimon for various reasons is something I have 
been thinking about. I'm interested partly because of the granularity that is 
lost over time like with any round robin style data collection scheme.

So I guess one question is whether you have considered pulling the data out to 
another database, looked at the SS GUI which uses a postgres db (iirc, about to 
take off on a flight and can't check), or looked at the Grafana bridge which 
would get data into OpenTsdb format, again iirc. Anyway, just some things for 
consideration and a request to share back whatever you find out if it's off 
list.

Thanks, getting stink eye to go to airplane mode.

More later.

Cheers
Kristy




On Sep 24, 2017 11:05 AM, "Keith Ball" 
<[email protected]<mailto:[email protected]>> wrote:
Hello All,
In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather 
metrics. During a period of 2 months, we ended up losing data twice from the 
zimon database; once after the virtual disk serving both the OS files and zimon 
collector and DB storage was resized, and a second time after an unknown event 
(the loss was discovered when plotting in Grafana only went back to a certain 
data and time; likewise, mmperfmon query output only went back to the same 
time).
Details:
- Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node 
and other clients
- Data retention in the "raw" stratum was set to 2 months; the "domains" 
settings were as follows (note that we did not hit the ceiling of 60GB 
(1GB/file * 60 files):

domains = {
        # this is the raw domain
        aggregation = 0         # aggregation factor for the raw domain is 
always 0.
        ram = "12g"             # amount of RAM to be used
        duration = "2m"         # amount of time that data with the highest 
precision is kept.
        filesize = "1g"         # maximum file size
        files = 60              # number of files.
},
{
        # this is the first aggregation domain that aggregates to 10 seconds
        aggregation = 10
        ram = "800m"            # amount of RAM to be used
        duration = "6m"         # keep aggregates for 1 week.
        filesize = "1g"         # maximum file size
        files = 10              # number of files.
},
{
        # this is the second aggregation domain that aggregates to 30*10 
seconds == 5 minutes
        aggregation = 30
        ram = "800m"            # amount of RAM to be used
        duration = "1y"         # keep averages for 2 months.
        filesize = "1g"         # maximum file size
        files = 5               # number of files.
},
{
        # this is the third aggregation domain that aggregates to 24*30*10 
seconds == 2 hours
        aggregation = 24
        ram = "800m"            # amount of RAM to be used
        duration = "2y"         #
        filesize = "1g"         # maximum file size
        files = 5               # number of files.
}

Questions:
1.) Has anyone had similar issues with losing data from zimon?

2.) Are there known circumstances where data could be lost, e.g. changing the 
aggregation domain definitions, or even simply restarting the zimon collector?

3.) Does anyone have any "best practices" for backing up the zimon database? We 
were taking weekly "snapshots" by shutting down the collector, and making a 
tarball copy of the /opt/ibm/zimon directory (but the database corruption/data 
loss still crept through for various reasons).


In terms of debugging, we do not have Scale or zimon logs going back to the 
suspected dates of data loss; we do have a gpfs.snap from about a month after 
the last data loss - would it have any useful clues? Opening a PMR could be 
tricky, as it was the customer who has the support entitlement, and the 
environment (specifically the old cluster definitino and the zimon collector 
VM) was torn down.


Many Thanks,
  Keith

--
Keith D. Ball, PhD
RedLine Performance Solutions, LLC
web:  
http://www.redlineperf.com/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redlineperf.com_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=Qda4XyOAjxfIGGSuRrYemKl8f0MXB4mp6nhdbmkjh20&s=dUvbBoiPFANvyGsOER5MAnt9-mwK69adFuLFatx2Rmw&e=>
email: [email protected]<mailto:[email protected]>
cell: 540-557-7851<tel:%28540%29%20557-7851>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=Qda4XyOAjxfIGGSuRrYemKl8f0MXB4mp6nhdbmkjh20&s=d6CkXN5mbyGvJQOduzX-LhJMANQgfvAV-nw_6ZgG-D4&e=>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=Qda4XyOAjxfIGGSuRrYemKl8f0MXB4mp6nhdbmkjh20&s=LkO3HEtokkzigjYqB4dIOUWLPhtikMbwcsXEakFp8DU&e=>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to