Scott> What are people's experiences with CentOS 5.0 64-bit installed Scott> on dual 3 Ghz quad-core PE2950 systems with 32 GB RAM each, Scott> high-performance computing (applications that tax both the CPUs Scott> and RAM), not currently in a Beowolf cluster but could adapt to Scott> that, and doing so with VMWare or other vitualization software Scott> vs activity being done directly in the OS?
I work at a place which has racks of dual Opteron boxes with 16gb of RAM, and others with 4 cpu, 4 core, 128gb memory machines, etc. We're doing ASIC design and simulations, so speed/memory is important to us. Scott> How much of a performance hit, or gain (I'd presume hit), does Scott> virtualization cause an application, resulting in what Scott> percentage poorer or better (I'd presume poorer) performance vs Scott> dealing directly with the OS? Umm... why do you want to virtualize compute nodes? What are you trying to achieve? Scott> It would be nice to have a VM perform some work, and if a Scott> person's code or application breaks, have it take down a VM Scott> while keeping a machine up, and not affecting other people's Scott> work. Umm... generally, if code breaks in userspace, the OS won't crash. We've never experienced user code taking down one of our boxes and we do lots of runs here, with systems up and running for months at a time, with hundreds or thousands of jobs running through them. In general, they go down due to hardware problems, not software. Esp with compute jobs. Why do you think the entire system will go down when someone's code breaks? Or are you worried that someone will write code which fills up all the memory on a machine due to a bug? In that case, resource limits and strict overcommit limits is the way to go. Scott> It may also depend on if an application or code is written Scott> directly with/for the physical cpu/hardware vs more general use Scott> (VM). Scott> Thanks for insights and experiences. Personally, I wouldn't bother to Virtualize my compute cluster at all. I'd just put it all into a batch scheduling system (Condor, Sun Grid Engine, LSF (if you have money)) and let the batch system load level the resources. John _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
