Chris,

A fine point, but I can't help but wonder whether they are related. I'll
leave a window attached to the machine and see what I can figure out.

I'm still working on getting syslog-ng going to I can use Splunk to see
what's really going on, so I won't have answers for a while...

--Steven.

-- 
Steven Lichti
Academic Technologies
Northwestern University
[email protected]
(847) 467-7805


On 9/30/11 12:34 PM, "Christopher Brooks" <[email protected]> wrote:

>Steven,
>
>Matterhorn doesn't do anything with SSH, so if the machines are not
>under a really high load, SSH should respond.  Can you check if the
>machines are under high load?  How long does it take for a machine to
>get inaccessible, and is it fairly repeatable?  Can you stay SSH'ed
>into the machine and just watch top to see if the load average jumps up?
>
>Unless MH is causing a high load, I think this is unrelated to MH.
>
>(Ubuntu 10.10?)
>
>Chris
>
>On Fri, 30 Sep 2011 03:40:03 +0000
>Steven M Lichti <[email protected]> wrote:
>
>> Chris,
>> 
>> I'm having a problem sort of like this. My capture agents are
>> dropping off the air, and while they are marked as offline, they are
>> still inaccessible. I can ping them, but not ssh to them. I'm at a
>> complete loss as to why these machines stop responding. I've taken to
>> restarting them a couple of times per morning to make sure they're
>> alright, and that has seemed to help a bit.
>> 
>> I've also checked the system log files, but haven't found anything
>> usefulŠ
>> 
>> --Steven.
>> 
>> --
>> Steven Lichti
>> Academic Technologies
>> Northwestern University
>> [email protected]
>> (847) 467-7805
>> 
>> 
>> 
>> From: Rubén Pérez <[email protected]<mailto:[email protected]>>
>> Reply-To: Matterhorn Users
>> 
>><[email protected]<mailto:matterhorn-users@opencastpro
>>ject.org>>
>> Date: Fri, 30 Sep 2011 01:38:53 +0200 To: Matterhorn Users
>> 
>><[email protected]<mailto:matterhorn-users@opencastpro
>>ject.org>>
>> Subject: Re: [Matterhorn-users] Heartburn
>> 
>> Hi Chris,
>> 
>> We do have the same problem around here and it have been driving us
>> crazy in our new pilot preliminary test. Can you elaborate on what
>> the "heartbeat" is? I understand it is some kind of "keep-alive" to
>> let the system know the machine is operative. What is the method you
>> used to disable it?
>> 
>> Thanks for you answers.
>> 
>> Best regards
>> Rubenciño
>> 
>> 2011/9/29 Christopher Brooks
>> <[email protected]<mailto:[email protected]>> Hi,
>> 
>> Our machines constantly get marked as offline.  Seems like under load
>> the heartbeat isn't getting through (for whatever reason).  We're
>> disabling the heartbeat on our local system to make up for this.
>> 
>> Anyone else having these issues on a distributed deployment?
>> 
>> Looking for people who might also be running into this, to help test
>> potential patches for 1.2.1.
>> 
>> Chris
>> 
>> --
>> Christopher Brooks, BSc, MSc
>> ARIES Laboratory, University of Saskatchewan
>> 
>> Web: http://www.cs.usask.ca/~cab938
>> Phone: 1.306.966.1442
>> Mail: Advanced Research in Intelligent Educational Systems Laboratory
>>     Department of Computer Science
>>     University of Saskatchewan
>>     176 Thorvaldson Building
>>     110 Science Place
>>     Saskatoon, SK
>>     S7N 5C9
>> _______________________________________________
>> Matterhorn-users mailing list
>> 
>>[email protected]<mailto:Matterhorn-users@opencastproj
>>ect.org>
>> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
>> 
>> _______________________________________________ Matterhorn-users
>> mailing list
>> 
>>[email protected]<mailto:Matterhorn-users@opencastproj
>>ect.org>
>> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
>
>
>
>-- 
>Christopher Brooks, BSc, MSc
>ARIES Laboratory, University of Saskatchewan
>
>Web: http://www.cs.usask.ca/~cab938
>Phone: 1.306.966.1442
>Mail: Advanced Research in Intelligent Educational Systems Laboratory
>     Department of Computer Science
>     University of Saskatchewan
>     176 Thorvaldson Building
>     110 Science Place
>     Saskatoon, SK
>     S7N 5C9

_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Reply via email to