I think everyone has a similar thoughts, but the presentation provides some real data and experiences.
BTW, for those interested, I have new poll on ClusterMonkey asking about clouds and HPC. (http://www.clustermonkey.net/) The last poll was on GP-GPU use. -- Doug > Doug, > > Thanks for posting that video. It confirmed what I always suspected > about clouds for HPC. > > > Prentice > > On 10/03/2011 08:25 AM, Douglas Eadline wrote: >> Interesting and pragmatic HPC cloud presentation, worth watching >> (25 minutes) >> >> http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/ >> >> -- >> Doug >> >>> >>> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars >>> >>> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud >>> >>> By Jon Brodkin | Published September 20, 2011 10:49 AM >>> >>> Amazon EC2 and other cloud services are expanding the market for >>> high-performance computing. Without access to a national lab or a >>> supercomputer in your own data center, cloud computing lets businesses >>> spin >>> up temporary clusters at will and stop paying for them as soon as the >>> computing needs are met. >>> >>> A vendor called Cycle Computing is on a mission to demonstrate the >>> potential >>> of Amazonâs cloud by building increasingly large clusters on the >>> Elastic >>> Compute Cloud. Even with Amazon, building a cluster takes some work, >>> but >>> Cycle combines several technologies to ease the process and recently >>> used >>> them to create a 30,000-core cluster running CentOS Linux. >>> >>> The cluster, announced publicly this week, was created for an unnamed >>> âTop 5 >>> Pharmaâ customer, and ran for about seven hours at the end of July at >>> a >>> peak >>> cost of $1,279 per hour, including the fees to Amazon and Cycle >>> Computing. >>> The details are impressive: 3,809 compute instances, each with eight >>> cores >>> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB >>> (petabytes) of disk space. Security was ensured with HTTPS, SSH and >>> 256-bit >>> AES encryption, and the cluster ran across data centers in three Amazon >>> regions in the United States and Europe. The cluster was dubbed >>> âNekomata.â >>> >>> Spreading the cluster across multiple continents was done partly for >>> disaster >>> recovery purposes, and also to guarantee that 30,000 cores could be >>> provisioned. âWe thought it would improve our probability of success >>> if >>> we >>> spread it out,â Cycle Computingâs Dave Powers, manager of product >>> engineering, told Ars. âNobody really knows how many instances you >>> can >>> get at >>> any one time from any one [Amazon] region.â >>> >>> Amazon offers its own special cluster compute instances, at a higher >>> cost >>> than regular-sized virtual machines. These cluster instances provide 10 >>> Gigabit Ethernet networking along with greater CPU and memory, but they >>> werenât necessary to build the Cycle Computing cluster. >>> >>> The pharmaceutical companyâs job, related to molecular modeling, was >>> âembarrassingly parallelâ so a fast interconnect wasnât crucial. >>> To >>> further >>> reduce costs, Cycle took advantage of Amazonâs low-price âspot >>> instances.â To >>> manage the cluster, Cycle Computing used its own management software as >>> well >>> as the Condor High-Throughput Computing software and Chef, an open >>> source >>> systems integration framework. >>> >>> Cycle demonstrated the power of the Amazon cloud earlier this year with >>> a >>> 10,000-core cluster built for a smaller pharma firm called Genentech. >>> Now, >>> 10,000 cores is a relatively easy task, says Powers. âWe think >>> weâve >>> mastered >>> the small-scale environments,â he said. 30,000 cores isnât the end >>> game, >>> either. Going forward, Cycle plans bigger, more complicated clusters, >>> perhaps >>> ones that will require Amazonâs special cluster compute instances. >>> >>> The 30,000-core cluster may or may not be the biggest one run on EC2. >>> Amazon >>> isnât saying. >>> >>> âI canât share specific customer details, but can tell you that we >>> do >>> have >>> businesses of all sizes running large-scale, high-performance computing >>> workloads on AWS [Amazon Web Services], including distributed clusters >>> like >>> the Cycle Computing 30,000 core cluster to tightly-coupled clusters >>> often >>> used for science and engineering applications such as computational >>> fluid >>> dynamics and molecular dynamics simulation,â an Amazon spokesperson >>> told >>> Ars. >>> >>> Amazon itself actually built a supercomputer on its own cloud that made >>> it >>> onto the list of the worldâs Top 500 supercomputers. With 7,000 >>> cores, >>> the >>> Amazon cluster ranked number 232 in the world last November with speeds >>> of >>> 41.82 teraflops, falling to number 451 in June of this year. So far, >>> Cycle >>> Computing hasnât run the Linpack benchmark to determine the speed of >>> its >>> clusters relative to Top 500 sites. >>> >>> But Cycleâs work is impressive no matter how you measure it. The job >>> performed for the unnamed pharma company âwould take well over a week >>> for >>> them to run internally,â Powers says. In the end, the cluster >>> performed >>> the >>> equivalent of 10.9 âcompute years of work.â >>> >>> The task of managing such large cloud-based clusters forced Cycle to >>> step >>> up >>> its own game, with a new plug-in for Chef the company calls Grill. >>> >>> âThere is no way that any mere human could keep track of all of the >>> moving >>> parts on a cluster of this scale,â Cycle wrote in a blog post. âAt >>> Cycle, >>> weâve always been fans of extreme IT automation, but we needed to >>> take >>> this >>> to the next level in order to monitor and manage every instance, >>> volume, >>> daemon, job, and so on in order for Nekomata to be an efficient 30,000 >>> core >>> tool instead of a big shiny on-demand paperweight.â >>> >>> But problems did arise during the 30,000-core run. >>> >>> âYou can be sure that when you run at massive scale, you are bound to >>> run >>> into some unexpected gotchas,â Cycle notes. âIn our case, one of >>> the >>> gotchas >>> included such things as running out of file descriptors on the license >>> server. In hindsight, we should have anticipated this would be an >>> issue, >>> but >>> we didnât find that in our prelaunch testing, because we didnât >>> test >>> at full >>> scale. We were able to quickly recover from this bump and keep moving >>> along >>> with the workload with minimal impact. The license server was able to >>> keep >>> up >>> very nicely with this workload once we increased the number of file >>> descriptors.â >>> >>> Cycle also hit a speed bump related to volume and byte limits on >>> Amazonâs >>> Elastic Block Store volumes. But the company is already planning bigger >>> and >>> better things. >>> >>> âWe already have our next use-case identified and will be turning up >>> the >>> scale a bit more with the next run,â the company says. But >>> ultimately, >>> âitâs >>> not about core counts or terabytes of RAM or petabytes of data. Rather, >>> itâs >>> about how we are helping to transform how science is done.â >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>> Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by MailScanner, and is >>> believed to be clean. >>> >>> >> >> > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf