Hi, Jörg. I would say five years is an OK lifetime. If you want to be aggressive about your lifecycle, a case can be made for three years.
Things keeping a cluster running longer: - lack of funding - one time cost - lack of communication - it isn't broken - researchers don't pay for electricity, cooling, facilities - users do not want to migrate - some applications may be difficult to map to new hardware / OS Things that should convince you to update: - two new servers can replace an entire rack of 10yo hardware - the savings in electricity could equal the new hardware cost - space is limited, new in, old out, temporary overlap - IO and core performance is way up! - warranty support = staff AND researchers sleep at night - refreshing the OS and software is a very good thing - new car smell That said, I know clusters that won't be turned off until a data center migration happens. I think the key here is to set expectations and have an SLA before deploying anything. Cheers. On Sun 04/27/14 09:45AM +0100, Jörg Saßmannshausen wrote: > Dear all, > > in some of the discussions here I came across the 'lifespan of a cluster' > argument. What I was wondering is: how long is that in HPC for number > crunching? > Is it 3 years (end of warranty), 5 years (making good use of hardware) or > longer? > > The reason behind that asking is: I got clusters here which are 10 years old, > and quite a number of them, and I would like to get a scheme implemented to > get the hardware replaced every X years with X being the 'lifespan of a > cluster'. One of the various options which are currently thrown around is to > move from my local data-centre (3 rooms, one is purely for the backup/file > storage and the other two for HPC) into the College shared data centre > (single > room). IF we are doing that, I am a bit worried that I get told in 5 years > time (for the sake of that argument): your clusters are end of lifetime, you > have to get rid of them as we need space / they are consuming too much energy. > > Thus, I am looking to get some answers for: how long are clusters run > typically and how is that done in other shared data centres? > > The current funding situation here means it is difficult, if not impossible, > to > get HPC hardware from funding agencies. Even if you get a bit of money, it is > just enough to get a new node. So most clusters are a bit organically grown > which makes administration difficult if you want to get really the best out > of > waht you paid for. In an ideal world, I would like to have that replaced > every > 5 years: old kit out, new kit in. In the real world, I got to run the kit > until it falls apart and hope that the Principal Investigator, i.e. the owner > of the cluster, got some money to replace the old/broken nodes. Hence the > questions so I can build up a good case to change there. > > I hope that makes sense to you. > > All the best from a overcast London! > > Jörg > > > -- > ************************************************************* > Dr. Jörg Saßmannshausen, MRSC > University College London > Department of Chemistry > Gordon Street > London > WC1H 0AJ > > email: [email protected] > web: http://sassy.formativ.net > > Please avoid sending me Word or PowerPoint attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Gavin W. Burris Senior Project Leader for Research Computing The Wharton School University of Pennsylvania _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
