Hi Gregg,

some input from a more abstract viewpoint: You already got feedback that 
monitoring the "up" status of the capture devices makes a lot of sense. In 
addition, I would suggest to differentiate between the "reference 
implementation" (RI) of the capture agent and manufacturer-built off-the-shelf 
devices (OTS) while looking at additional monitoring requirements and 
techniques:

1) One of the most important features of the RI is the capability to get 
recording information from the scheduling system. Your capture agent may be up 
and running just fine but can't for some reason access the scheduling feed. 
Therefore I suggest a nagios plugin that uses the agent configuration to make 
an HTTP (HEAD) request to the calendar and only returns OK when that request 
returns with a 200 (OK) or 304 (NOT MODIFIED) status code.

This may make sense for OTS agents as well depending on how their scheduling 
system works.

2) I would assume that OTS agent CPU power aligns with the configuration 
options, read: you can't configure the system in a way that recording would eat 
up more processing power than is available, resulting in dropped frames or 
otherwhise crippled recordings. 

Opposed to this, this may not be true for the RI, as it is a fairly open 
platform and people are already playing with different clock speeds, capture 
cards etc. So what I'm suggesting is that for those configurations, a CPU 
monitoring would be very helpful in making sure that CPU is not maxed out.

3) An additional possibility in monitoring would be to check the recordings 
that are still on the capture agent (which means they haven't been ingested). 
This might be due to a missing network connection, the ingest system being down 
etc. but will leed to a failure to record new lectures as the hard drive will 
soon fill up. 

I see this as a complementary (but more intelligent) check to the disk capacity 
check, as you never know how much data will be written once a recording has 
started. So warning at 80 or 90% may not help if the recording keeps going.

Tobias

On 21.07.2011, at 17:42, Greg Logan wrote:

> Hi Folks,
> 
> Cross posting to ensure everyone hears this, sorry about the spam!
> 
> I'm the guy doing the Matterhorn integration work for the new Epiphan
> Matterhorn Capture Appliances (MCAs), as well as a developer for the
> normal Matterhorn capture agent (CA).  We have about 10 CAs deployed on
> our campus currently, and one of our concerns was monitoring their
> status outside of Matterhorn itself.  To do this we use Nagios to check
> if they're up by sshing into them, and we're working on the next step of
> checking the system health (smart, disk space, etc) as well as making
> sure they capture when they should.
> 
> Does anyone else monitor their CAs in a similar way?  Are you using
> Nagios as well?  Munin?  I ask, because we are at a stage in the MCA
> development where adding something like Nagios would be easy (memory
> space permitting!), so I'd like some community feedback in terms of what
> people are looking for in an SNMP monitor.
> 
> Thanks,
> G
> 
> _______________________________________________
> Matterhorn mailing list
> [email protected]
> http://lists.opencastproject.org/mailman/listinfo/matterhorn
> 
> 
> To unsubscribe please email
> [email protected]
> _______________________________________________

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to