Hi Gregg, some input from a more abstract viewpoint: You already got feedback that monitoring the "up" status of the capture devices makes a lot of sense. In addition, I would suggest to differentiate between the "reference implementation" (RI) of the capture agent and manufacturer-built off-the-shelf devices (OTS) while looking at additional monitoring requirements and techniques:
1) One of the most important features of the RI is the capability to get recording information from the scheduling system. Your capture agent may be up and running just fine but can't for some reason access the scheduling feed. Therefore I suggest a nagios plugin that uses the agent configuration to make an HTTP (HEAD) request to the calendar and only returns OK when that request returns with a 200 (OK) or 304 (NOT MODIFIED) status code. This may make sense for OTS agents as well depending on how their scheduling system works. 2) I would assume that OTS agent CPU power aligns with the configuration options, read: you can't configure the system in a way that recording would eat up more processing power than is available, resulting in dropped frames or otherwhise crippled recordings. Opposed to this, this may not be true for the RI, as it is a fairly open platform and people are already playing with different clock speeds, capture cards etc. So what I'm suggesting is that for those configurations, a CPU monitoring would be very helpful in making sure that CPU is not maxed out. 3) An additional possibility in monitoring would be to check the recordings that are still on the capture agent (which means they haven't been ingested). This might be due to a missing network connection, the ingest system being down etc. but will leed to a failure to record new lectures as the hard drive will soon fill up. I see this as a complementary (but more intelligent) check to the disk capacity check, as you never know how much data will be written once a recording has started. So warning at 80 or 90% may not help if the recording keeps going. Tobias On 21.07.2011, at 17:42, Greg Logan wrote: > Hi Folks, > > Cross posting to ensure everyone hears this, sorry about the spam! > > I'm the guy doing the Matterhorn integration work for the new Epiphan > Matterhorn Capture Appliances (MCAs), as well as a developer for the > normal Matterhorn capture agent (CA). We have about 10 CAs deployed on > our campus currently, and one of our concerns was monitoring their > status outside of Matterhorn itself. To do this we use Nagios to check > if they're up by sshing into them, and we're working on the next step of > checking the system health (smart, disk space, etc) as well as making > sure they capture when they should. > > Does anyone else monitor their CAs in a similar way? Are you using > Nagios as well? Munin? I ask, because we are at a stage in the MCA > development where adding something like Nagios would be easy (memory > space permitting!), so I'd like some community feedback in terms of what > people are looking for in an SNMP monitor. > > Thanks, > G > > _______________________________________________ > Matterhorn mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn > > > To unsubscribe please email > [email protected] > _______________________________________________ _______________________________________________ Matterhorn-users mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
