Chris, A fine point, but I can't help but wonder whether they are related. I'll leave a window attached to the machine and see what I can figure out.
I'm still working on getting syslog-ng going to I can use Splunk to see what's really going on, so I won't have answers for a while... --Steven. -- Steven Lichti Academic Technologies Northwestern University [email protected] (847) 467-7805 On 9/30/11 12:34 PM, "Christopher Brooks" <[email protected]> wrote: >Steven, > >Matterhorn doesn't do anything with SSH, so if the machines are not >under a really high load, SSH should respond. Can you check if the >machines are under high load? How long does it take for a machine to >get inaccessible, and is it fairly repeatable? Can you stay SSH'ed >into the machine and just watch top to see if the load average jumps up? > >Unless MH is causing a high load, I think this is unrelated to MH. > >(Ubuntu 10.10?) > >Chris > >On Fri, 30 Sep 2011 03:40:03 +0000 >Steven M Lichti <[email protected]> wrote: > >> Chris, >> >> I'm having a problem sort of like this. My capture agents are >> dropping off the air, and while they are marked as offline, they are >> still inaccessible. I can ping them, but not ssh to them. I'm at a >> complete loss as to why these machines stop responding. I've taken to >> restarting them a couple of times per morning to make sure they're >> alright, and that has seemed to help a bit. >> >> I've also checked the system log files, but haven't found anything >> usefulŠ >> >> --Steven. >> >> -- >> Steven Lichti >> Academic Technologies >> Northwestern University >> [email protected] >> (847) 467-7805 >> >> >> >> From: Rubén Pérez <[email protected]<mailto:[email protected]>> >> Reply-To: Matterhorn Users >> >><[email protected]<mailto:matterhorn-users@opencastpro >>ject.org>> >> Date: Fri, 30 Sep 2011 01:38:53 +0200 To: Matterhorn Users >> >><[email protected]<mailto:matterhorn-users@opencastpro >>ject.org>> >> Subject: Re: [Matterhorn-users] Heartburn >> >> Hi Chris, >> >> We do have the same problem around here and it have been driving us >> crazy in our new pilot preliminary test. Can you elaborate on what >> the "heartbeat" is? I understand it is some kind of "keep-alive" to >> let the system know the machine is operative. What is the method you >> used to disable it? >> >> Thanks for you answers. >> >> Best regards >> Rubenciño >> >> 2011/9/29 Christopher Brooks >> <[email protected]<mailto:[email protected]>> Hi, >> >> Our machines constantly get marked as offline. Seems like under load >> the heartbeat isn't getting through (for whatever reason). We're >> disabling the heartbeat on our local system to make up for this. >> >> Anyone else having these issues on a distributed deployment? >> >> Looking for people who might also be running into this, to help test >> potential patches for 1.2.1. >> >> Chris >> >> -- >> Christopher Brooks, BSc, MSc >> ARIES Laboratory, University of Saskatchewan >> >> Web: http://www.cs.usask.ca/~cab938 >> Phone: 1.306.966.1442 >> Mail: Advanced Research in Intelligent Educational Systems Laboratory >> Department of Computer Science >> University of Saskatchewan >> 176 Thorvaldson Building >> 110 Science Place >> Saskatoon, SK >> S7N 5C9 >> _______________________________________________ >> Matterhorn-users mailing list >> >>[email protected]<mailto:Matterhorn-users@opencastproj >>ect.org> >> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users >> >> _______________________________________________ Matterhorn-users >> mailing list >> >>[email protected]<mailto:Matterhorn-users@opencastproj >>ect.org> >> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users > > > >-- >Christopher Brooks, BSc, MSc >ARIES Laboratory, University of Saskatchewan > >Web: http://www.cs.usask.ca/~cab938 >Phone: 1.306.966.1442 >Mail: Advanced Research in Intelligent Educational Systems Laboratory > Department of Computer Science > University of Saskatchewan > 176 Thorvaldson Building > 110 Science Place > Saskatoon, SK > S7N 5C9 _______________________________________________ Matterhorn mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email [email protected] _______________________________________________
