Dear Chris, further to your email:
> - And if miracles occur and they do have expert level linux people then > more often than not these people are overworked or stretched in many > directions This is exactly what has happened to me at the old work place: pulled into too many different directions. I am a bit surprised about the ZFS experiences. Although I did not have petabyte of storage and I did not generate 300 TB per week, I did have a fairly large storage space running on xfs and ext4 for backups and provisioning of file space. Some of it was running on old hardware (please sit down, I am talking about me messing around with SCSI cables) and I gradually upgraded to newer one. So, I am not quite sure what went wrong with the ZFS storage here. However, there is a common trend, at least what I observe here in the UK, to out-source problems: pass the bucket to somebody else and we pay for it. I am personally still more of an in-house expert than an out-sourced person who may or may not be able to understand what you are doing. I should add I am working in academia and I know little about the commercial world here. Having said that, my friends in commerce are telling me that the company likes to outsource as it is 'cheaper'. I agree with the Linux expertise. I think I am one of the two who are Linux admins in the present work place. The official line is: we do not support Linux (but we teach it). Anyhow, I don't want to digress here too much. However, "..do HPC work in commercial environments where the skills simply don't exist onsite." Are we a dying art? My 1 shilling here from a still cold and dark London. Jörg Am Mittwoch, 2. Mai 2018, 16:19:48 BST schrieb Chris Dagdigian: > Jeff White wrote: > > I never used Bright. Touched it and talked to a salesperson at a > > conference but I wasn't impressed. > > > > Unpopular opinion: I don't see a point in using "cluster managers" > > unless you have a very tiny cluster and zero Linux experience. These > > are just Linux boxes with a couple applications (e.g. Slurm) running > > on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the > > way more than they help IMO. They are mostly crappy wrappers around > > free software (e.g. ISC's dhcpd) anyway. When they aren't it's > > proprietary trash. > > > > I install CentOS nodes and use > > Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and > > software. This also means I'm not suck with "node images" and can > > instead build everything as plain old text files (read: write > > SaltStack states), update them at will, and push changes any time. My > > "base image" is CentOS and I need no "baby's first cluster" HPC > > software to install/PXEboot it. YMMV > > Totally legit opinion and probably not unpopular at all given the user > mix on this list! > > The issue here is assuming a level of domain expertise with Linux, > bare-metal provisioning, DevOps and (most importantly) HPC-specific > configStuff that may be pervasive or easily available in your > environment but is often not easily available in a > commercial/industrial environment where HPC or "scientific computing" > is just another business area that a large central IT organization must > support. > > If you have that level of expertise available then the self-managed DIY > method is best. It's also my preference > > But in the commercial world where HPC is becoming more and more > important you run into stuff like: > > - Central IT may not actually have anyone on staff who knows Linux (more > common than you expect; I see this in Pharma/Biotech all the time) > > - The HPC user base is not given budget or resource to self-support > their own stack because of a drive to centralize IT ops and support > > - And if they do have Linux people on staff they may be novice-level > people or have zero experience with HPC schedulers, MPI fabric tweaking > and app needs (the domain stuff) > > - And if miracles occur and they do have expert level linux people then > more often than not these people are overworked or stretched in many > directions > > > So what happens in these environments is that organizations will > willingly (and happily) pay commercial pricing and adopt closed-source > products if they can deliver a measurable reduction in administrative > burden, operational effort or support burden. > > This is where Bright, Univa etc. all come in -- you can buy stuff from > them that dramatically reduces that onsite/local IT has to manage the > care and feeding of. > > Just having a vendor to call for support on Grid Engine oddities makes > the cost of Univa licensing worthwhile. Just having a vendor like Bright > be on the hook for "cluster operations" is a huge win for an overworked > IT staff that does not have linux or HPC specialists on-staff or easily > available. > > My best example of "paying to reduce operational burden in HPC" comes > from a massive well known genome shop in the cambridge, MA area. They > often tell this story: > > - 300 TB of new data generation per week (many years ago) > - One of the initial storage tiers was ZFS running on commodity server > hardware > - Keeping the DIY ZFS appliances online and running took the FULL TIME > efforts of FIVE STORAGE ENGINEERS > > They realized that staff support was not scalable with DIY/ZFS at > 300TB/week of new data generation so they went out and bought a giant > EMC Isilon scale-out NAS platform > > And you know what? After the Isilon NAS was deployed the management of > *many* petabytes of single-namespace storage was now handled by the IT > Director in his 'spare time' -- And the five engineers who used to do > nothing but keep ZFS from falling over were re-assigned to more > impactful and presumably more fun/interesting work. > > > They actually went on stage at several conferences and told the story of > how Isilon allowed senior IT leadership to manage petabyte volumes of > data "in their spare time" -- this was a huge deal and really resonated > . Really reinforced for me how in some cases it's actually a good idea > to pay $$$ for commercial stuff if it delivers gains in > ops/support/management. > > > Sorry to digress! This is a topic near and dear to me. I often have to > do HPC work in commercial environments where the skills simply don't > exist onsite. Or more commonly -- they have budget to buy software or > hardware but they are under a hiring freeze and are not allowed to bring > in new Humans. > > Quite a bit of my work on projects like this is helping people make > sober decisions regarding "build" or "buy" -- and in those environments > it's totally clear that for some things it makes sense for them to pay > for an expensive commercially supported "thing" that they don't have to > manage or support themselves > > > My $.02 ... > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf