John,

Our full node (4 machines) is designed for some fault tolerance in mind in 
terms of equipment but mostly because we have a portable node that can take up 
the slack if hardware failures occur and we built in a "5th" machine for "just 
in case".

Our two capture machines have the ability to each do video and audio capture so 
that if one fails, the other can take up the slack. We've actually had to do 
this before when we had a failure (OS) on one of the capture machines. Took 
about 15 minutes to realize we couldn't recover it in time and switch things 
over (because I had had the student practice this about 3 weeks prior).

The echo canceler can temporarily be relieved by our 4 channel echo canceler on 
our pig node if the echo canceler itself fails.
Our pig node can function in place of the display node should it fail.
We have a 5th machine with a couple of spare capture cards that currently 
serves as our venue server, but it is ready at a moment's notice to be a video 
capture/audio machine as well.

If we bring in everything into play - we actually have (from our full node with 
our portable as supplement) 14 video captures, 4 audio captures, two echo 
cancelers, 4 speakers, 4 projectors, two 61" plasmas, and 8 microphones (2 
wireless) to bring to bear. Mostly because our portable node has a full 
complement of audio and video to go with it. Granted, we can over load it's bus 
pretty easily if we really want to, but it serves as redundancy to the full 
node and it's own node in it's own right for auditoriums and small conference 
rooms with the ability to run 2 projectors in addition to it's own two operator 
displays (or 1 display and 3 projectors.)

I've found that with the AG 2.x toolkit it is extremely easy to be flexible 
with regards to what resources you want to bring in. I have a two room node run 
by 6 machines - and the rooms are nearby. The xap800 runs BOTH ROOMS. Why? 
because it's designed to run many rooms in what it can really do. I split up 
the displays and the captures - and even run two service managers (hard coded 
one for separate port) on one of the machines where i run video for one room 
and audio for the other (splitting up it's load between the two rooms). 
Essentially it's two 3 machine nodes thrown together in the same rack - we've 
utilized one of the rooms as a result as an overflow room locally when we had 
too many in the audience to fit into one room. Quite flexible indeed. I've run 
ALL the nodes on campus as one huge node... just to try it. We have 3 full 
nodes on campus (going to double very soon) and one portable.

-John Q.

--

John I. Quebedeaux, Jr.; Louisiana State University

Computer Manager LBRN; 131 Life Sciences Bldg.

e-mail: jo...@lsu.edu<mailto:jo...@lsu.edu>; web: http://lbrn.lsu.edu

phone: 225-578-0062 / fax: 225-578-2597



On Oct 31, 2005, at 9:22 AM, John Langkals wrote:




   Hello AGTech,



   How do you support fault tolerance within your Access Grid node?  If you 
would experience catastrophic failure of your node hardware, what kind of 
backup have you designed into your production nodes to maintain service?



   Thank you,



   John



   John Langkals
   Systems Manager
   OCTS
   M2021 Physics Research Building
   191 West Woodruff Avenue
   Columbus, Ohio 43210
   614.292.6957 Office
   614.327.3732 Cell
   614.292.7557 FAX
   www.octs.osu.edu





Reply via email to