It would be good to know what were the levels of efficiency of the applications wrt FLOP/s and GB/s and the typical node count for the runs. Then compare that against the current PF/s systems.
Joshua ------ Original Message ------ Received: 05:49 PM CEST, 04/05/2013 From: Eugen Leitl <[email protected]> To: [email protected] Subject: [Beowulf] Revelations on Roadrunner's Retirement > > http://www.hpcwire.com/hpcwire/2013-04-04/revelations_on_roadrunner_s_retirement.html?featured=top > > Revelations on Roadrunner's Retirement > > Nicole Hemsoth > > Earlier this week we reported on the decommissioning of the Roadrunner > supercomputer at Los Alamos National Laboratory, which was being shuttered > following a stint of fame as the first system to break the petascale barrier > back in 2008. > > According to Paul Henning from the computational physics division at Los > Alamos, Roadrunner’s checkout made big news, but the end of the line for the > super was well-planned, if not right on schedule. > > The system served its purpose chewing a bevy of mostly classified and some > key civilian code. However, in the end, the combination of a finite contract, > an extinct chip, the cost of crumpling up code to fit into IBM’s Cell, and > the promise of swifter, more efficient technologies were main factors in the > planned clipped lifecycle of the petaflop pioneer. > > “Rather than think of these machines as physical entities, we think of them > as projects,” he explained. “At the beginning of the Roadrunner acquisition > we laid out a project lifetime for this—and that lifetime considered a number > of things, including the cost of maintenance, power, vendor and licensing > contracts, and how we would upgrade the system.” > > Henning detailed that the support contract with IBM was up and since they > don’t even produce the core of the machine’s architecture, the Cell, the > question of even scrounging up some spare parts would have presented a rather > tricky issue. The retirement party had been planned years ago anyway, but > there are some meaty learning opportunities to glean from the scrap metal. > > When any system at the lab is shuttered, the autopsy, which looks at > everything from the integrity of the memory and OS to the more nuts and bolts > physical properties, is performed. A key finding of the post-mortem revolves > around the condition of the boxes after five years of heat, wear and > tear—it’s here where the materials analysis begins. It’s given the renowned > materials science team at the center an insider’s view into the real stress > on systems after high-yield, high-heat production—and from what we read > between the lines, these boxes are maxed out. > > Then again, there were never any plans to build the system out to new glory > ala the Jaguar to Titan transformation. Anyway, even if the hardware wasn’t > on its last, weak leg, considering they’d have to retrofit the entire system > since IBM would return a 404 on their build-out needs, it makes sense that > they’d want to rip…and of course, replace. > > Currently, Los Alamos has sent its applications on a redirect course to the > smaller, slightly more efficient and roughly performance-equivalent Cielo > system, which is housed in the same space as the now-defunct Roadrunner. > Henning said the developer-friendly architecture saves time and money on code > retooling, ostensibly while they try to fit something new into their > environment. > > And so here is where things get interesting. Because we can speculate on what > Los Alamos might dream up to fill the 6,000 square foot gap left behind. > That’s a pretty large spate of empty space for any upstart system to settle > into. Titan’s sprawl is right under 5,000 square feet and a lot of flops have > fit in less than that. > > There are a few hints at what might sit on the charred spot Roadrunner once > occupied post-ripdown. However, it’s worth noting that a quick perusal of the > NNSA’s procurement plans for the next year include something on the order of > a $50 million to (yes) one billion dollar project, which is currently > accepting proposals. And it’s kind of hard to imagine what else would be > filed under tech procurements to that monetary tune. If any of you know > anything about this, that comments section down there looks awfully > empty….(hint, hint). > > All speculation aside, it looks like we’ll find out soon enough—probably > later this year—just what will turn off that vacancy sign at the lab. Until > then, the Roadrunner story serves as a reminder about how quickly the tides > of this type of tech shift and leave superhero machines drifting into > forgotten waters. > > When national labs and large HPC sites sit down to spill ink on new system > designs, they’re hedging their bets on what future technologies will look > like. It’s rare, unless folks are on a TACC/Stampede-like course to go from > ground to super in a tick over a year, to know what innovations on the > architecture, efficiency or acceleration front will yield big > price-performance dividends. So at the time that Los Alamos set about > architecting Roadrunner based on the very unique Cell approach, they were > placing their bets on the future of that technology. > > Since that development cycle, the rise of GPU acceleration, the introduction > of the promising Phi, and some efficiency tweaks on the software side have > rendered some of what made Roadrunner shine seem rather date. It’s now > possible to get more compute power in a smaller power envelope…and with a lot > less in the way of programming hassle, as well, notes Henning. However, for > the NNSA and Los Alamos, whatever the clandestine code was they cooked around > the Cell, it must have been worth the effort on the retooling side. > > Although the story of the Roadrunner being forced into retirement found its > way into a number of mainstream tech media stories over the course of the > week, this is a pretty standard order of operations for large HPC centers, > especially national labs. Henning stressed that the shutdown of the > once-famous system is not unlike the series of other supers they’ve shuttered > in succession at the center. They build a plan for acquisition, see a machine > run its course, learn from it post-mortem and shuttle it off in parts to make > way for something fresh. > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
