I recently had the interesting experience of bringing up a rack of 20 dual processor Itanium II machines (HP zx6000s), all now running Debian, over a 48 hour period. The interesting part was all the equipment showed up on monday, and we had the classic "very big demo" setup for thursday morning. The decision was made to see how far along we could be to actually using the new rack for the demo, so off we went... Tuesday was spent plugging in cables, setting things up, and then we launched into installing Debian from CD. (this was in a physically isolated network). I've installed Debian (and Redhat) on a number of machines, but the entire stack was a bit daunting... (also warm, you could feel the heat when you walked down the hallway)
Once the two sysadmins working with me got the hang of it, they started installing like gang busters, while I had to still port a critical part of the software to Linux. As I had never seen a live data feed for this application before, there were several code bugs (it turned out what I ported didn't run on any platform anymore...) in the networking. Prior to this I had added byte swapping to the networking code, so I knew once I got the networking issues fixed, I could "probably" read the data... Surprisingly that only took a few hours. Then I took this code, and started to modify the networking code to read a 32 big endian data feed, on a 64 bit little endian system. Some things like long, time_t, and bool change between the two platforms, so I had to hack on the data types, and do some nasty munging as I read in the data, but things again came up in a few hours. About that time, we had the first 8 machines on the rack fully installed, and booting Debian. By installing a FAT 16 partition on the other drive, we managed to make them all dual boot, although I don't think we'll be using HPUX much... At this point we called it a day. Wed I started bring up the freshly ported software on the machines that were already working, while they kept cranking to get the rest all installed. This I mostly did by NFS mounting from the executables, and doing the other system setup tasks that needed to be done to run this large application. Eventually the data feed went down, and despite a series of "early morning" phone calls to the east coast, nobody could fix it till daybreak. I brought up the other machines off some prerecorded data files I had made earlier in the day, and eventually had our software running on the entire rack of 20. I used an HP wx6000 machine, also running Linux, to remotely display the GUIs for each of the machines on the rack, one machine per workspace. Suprisingly, a single wx6000 managed to run remotely 20 machines running reasonably high end graphics. The live data feed had come back up by morning, and we had all 20 machines up and running about 30 minutes before all the suits came into the room for the demo. :-) General impressions are, what a great platform! Debian installed like a champ, and each machine came up pretty efficiently. The only big problem I had is GCC 2.96 sucks, and 3.0.4 isn't much better (We were running stable woody). I finally had to go with a recent build I had done of GCC "3.4" from a few weeks old CVS tree. The other versions kept having weird problems with C++ and systems headers, which just "went away" using a more recent GCC. I had to give up on our one XView based application (thanks to this list, now I know why...) , but managed to get the new Java version up instead using Sun's java for Redhat 7.3. A few other thoughts. I'd really love to see a port of Valgrind to the Itanium. It's my favorite memory checker. I'd also like to really see the NPTL work supported. I'm tired of always having to rewrite POSIX semaphore code to use SVR4 semaphores. The software was NASA's Traffic Flow Automation System (TFAS), an experimental project for strategic Air Traffic Management based on NASA's Center-TRACON Automation System (CTAS), which has been operationally deployed by the FAA in a number of air traffic control centers (http://ctas.arc.nasa.gov). Each machine runs CTAS software that predicts the movements of aircraft within an Air Route Traffic Control Center (ARTCC). There are 20 ARTCCs in the continental US, and so 20 machines cover all the traffic in the country. If TFAS is accepted for use by the FAA (this is still under consideration), it will definitely be considered a 'mission critical' application. We've tested TFAS with Ultrasparc/Solaris, HP's PA-RISC/HP-UX, Pentium/Linux, Power-PC/MacOSX and so far the Itanium/Linux combo is our platform of choice. So one of these days you'll be flying on ia64-Linux based systems. :-) - rob -

