Hi All! We are in the process of moving from Nagios 2.5 to Nagios 3.0 with MySQL. We monitor and report services for several customers and thus have a number of SLAs to consider. Currently we have a self-written reporting mechanism, but the developer is no longer with the company and the documentation is lacking in many areas. Since we are using the Nagios NDO, we would prefer not to try to force the old mechanism to work with 3.0. So, we need a new reporting mechanism.
I looked at a couple of tools, but found nothing which seems to be close to finished and none that address adding downtimes after the fact. We cannot simply define a check_period or notification_period and consider that, because we need to monitor 24x7 and more or less prove we monitoring even if there is scheduled maintenance. Also there are cases where the service is down and it is not our fault and per the SLA we do not subtract that time from availability. Therefore, we need a mechanism to be able to somehow add downtimes after the fact which then prevents the reporting mechanisms from counting that time. NagiosSLA seems promising and I downloaded it from SourceForge. However, I do not find any mechanism to manage the SLA periods other than simply saying to reporting everything within the check_period. Since we are using NDO, creating an extra EventHandler seems like a waste and the report_script.pl seems to depend on the DB tables filled by the event handler. Looking at the script, I do not seem much of a problem changing the table and column names. However, as far as I can tell, the sla_exclusion table is never really used. The exlusions are read into an array ( my @exclusion = retrieveData("sla_exclusion"); but @exclusion is never used after that. This means that every outage is reported. Since we already have the data in MySQL, I thought about simply using the nagios_scheduleddowntime tables. However, I see a problem with outages in the past. As far as I can tell, if you schedule an downtime in the past, it is silently ignored. Also, from what I see, the table is cleared when the outage is over. Both of these are logical to some extent and I think my C is good enough to be able to modify the code to either add all outages and not delete them, or maybe or straight-forward simply write to a completly different table and avoid changing too much existing code. So, the first question is whether there are any tools available to do SLA Reporting properly, FOSS or commercial. If not, does anyone have any suggestions about making changes to the existing code as I suggested? I would be grateful for any input. Regards, Jim Mohr ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null