John, Our full node (4 machines) is designed for some fault tolerance in mind in terms of equipment but mostly because we have a portable node that can take up the slack if hardware failures occur and we built in a "5th" machine for "just in case".
Our two capture machines have the ability to each do video and audio capture so that if one fails, the other can take up the slack. We've actually had to do this before when we had a failure (OS) on one of the capture machines. Took about 15 minutes to realize we couldn't recover it in time and switch things over (because I had had the student practice this about 3 weeks prior). The echo canceler can temporarily be relieved by our 4 channel echo canceler on our pig node if the echo canceler itself fails. Our pig node can function in place of the display node should it fail. We have a 5th machine with a couple of spare capture cards that currently serves as our venue server, but it is ready at a moment's notice to be a video capture/audio machine as well. If we bring in everything into play - we actually have (from our full node with our portable as supplement) 14 video captures, 4 audio captures, two echo cancelers, 4 speakers, 4 projectors, two 61" plasmas, and 8 microphones (2 wireless) to bring to bear. Mostly because our portable node has a full complement of audio and video to go with it. Granted, we can over load it's bus pretty easily if we really want to, but it serves as redundancy to the full node and it's own node in it's own right for auditoriums and small conference rooms with the ability to run 2 projectors in addition to it's own two operator displays (or 1 display and 3 projectors.) I've found that with the AG 2.x toolkit it is extremely easy to be flexible with regards to what resources you want to bring in. I have a two room node run by 6 machines - and the rooms are nearby. The xap800 runs BOTH ROOMS. Why? because it's designed to run many rooms in what it can really do. I split up the displays and the captures - and even run two service managers (hard coded one for separate port) on one of the machines where i run video for one room and audio for the other (splitting up it's load between the two rooms). Essentially it's two 3 machine nodes thrown together in the same rack - we've utilized one of the rooms as a result as an overflow room locally when we had too many in the audience to fit into one room. Quite flexible indeed. I've run ALL the nodes on campus as one huge node... just to try it. We have 3 full nodes on campus (going to double very soon) and one portable. -John Q. -- John I. Quebedeaux, Jr.; Louisiana State University Computer Manager LBRN; 131 Life Sciences Bldg. e-mail: jo...@lsu.edu<mailto:jo...@lsu.edu>; web: http://lbrn.lsu.edu phone: 225-578-0062 / fax: 225-578-2597 On Oct 31, 2005, at 9:22 AM, John Langkals wrote: Hello AGTech, How do you support fault tolerance within your Access Grid node? If you would experience catastrophic failure of your node hardware, what kind of backup have you designed into your production nodes to maintain service? Thank you, John John Langkals Systems Manager OCTS M2021 Physics Research Building 191 West Woodruff Avenue Columbus, Ohio 43210 614.292.6957 Office 614.327.3732 Cell 614.292.7557 FAX www.octs.osu.edu