RE: [Beowulf] how Google warps your brain

Lux, Jim (337C) Thu, 21 Oct 2010 16:01:23 -0700

> -----Original Message-----
> From: Mark Hahn [mailto:h...@mcmaster.ca]
> Sent: Thursday, October 21, 2010 3:01 PM
> To: Lux, Jim (337C)
> Cc: Beowulf Mailing List
> Subject: RE: [Beowulf] how Google warps your brain
> 
> >> I'm pretty convinced that, ignoring granularity or political issues, shared
> >> resources save a lot in leadership, infrastructure, space, etc.
> >
> > OTOH, it's just those granularity and cost accounting issues that led to
> > Beowulfs being built in the first place.
> 
> I'm not really sure I understand what you mean.  by "granularity", I just
> meant that you can't really have fractional sysadmins, and a rack with 1 node
> consumes as much floor space as a full rack.  in some sense, smaller clusters
> have their costs "rounded down" - there's a size beneath which you tend to
> avoid paying for power, cooling, etc.  perhaps that's what you meant by cost-
> accounting.



That's exactly what I meant.  In any organization, there's a certain level of 
detail below which they don't generally require reporting.  Likewise, there's a 
certain threshold value for the signature chain.   For institutionally provided 
services on a chargeback basis (e.g. phone calls, cpu seconds on the mainframe, 
etc.) the expectation is that costs are tracked to the penny (or, as the 
federal rules have it, you have to make sure that they are allocable, 
accountable, and allowable).  For things bought in chunks, the resolution 
requirement is typically at the "purchase" level (e.g. nobody makes me allocate 
a $1000 computer to 15 different cost accounts, but I would have to account for 
disk space on the institutional server at that level).

(because it's a "diminishing returns" issue.. it's cheap to define which cost 
accounts pay for which disk directories, it's not cheap to split Purchase 
Orders between accounts)

> 
> but do you think these were really important at the beginning?  to me,
> beowulf is "attack of the killer micro" applied to parallelism.  that is,
> mass-market computers that killed the traditional glass-house boxes:
> vector supers, minis, eventually anything non-x86.  the difference was
> fundamental (much cheaper cycles), rather than these secondary issues.

I think this was.. If you needed horsepower, you could either go fight the 
budget battle to buy cpu seconds on the big iron OR you could buy your own 
supercomputer and not have to worry about the significant administrative time 
setting up and reconciling and reporting on those cpu seconds across all your 
projects. And, because the glass house box is very visible and high value, 
there is a lot of oversight to "make sure that we are effectively using the 
asset" and "that the operations cost is fairly allocated among the users".

Particularly in places where there is strict cost accounting on things and not 
so strict on labor (e.g. your salary is paid for already by some generic 
bucket) this could be a big driver: you could spend your own time essentially 
for free.


> 
> > I suspect (nay, I know, but just can't cite the references) that this sort
> >of issue is not unique to HPC, or even computing and IT.  Consider
> >libraries, which allow better utilization of books, at the cost of someone
> >else deciding which books to have in stock.
> 
> well, HPC is unique in scale of bursting.  even if you go on a book binge,
> there's no way you can consume orders of magnitude  more books as I can,
> or compared to your trailing-year average.  but that's the big win for HPC
> centers - if everyone had a constant demand, a center would deliver only
> small advantages, not even much better than a colo site.

Yes.. that's why the library/book model isn't as good as it could be. 

> 
> > And consider the qualitatively
> >different experience of "browsing in the stacks" vs "entering the call
> >number in the book retrieval system".. the former leads to serendipity as
> >you turn down the wrong aisle or find a mis-shelved volume; the latter is
> >faster and lower cost as far as a "information retrieval" function.
> 
> heh, OK.  I think that's a bit of a stretch, since your serendipity would
> not scale with the size of the library, but mainly with its messiness ;)
> 
> >get paid for. And this is because they've bought a certain amount of
> >computational resources for me, and leave it up to me to use or not, as I
> >see fit.
> 
> I find myself using my desktop more and more as a terminal - I hardly
> ever run anything but xterm and google chrome.  as such, I don't mind
> that it's a terrible old recycled xeon from a 2003 project.  it would seem
> like a waste of money to buy something modern, (and for me to work locally)
> since there are basically infinite resources 1ms away as the packet flies...

And as long as there's not a direct cost to you (or your budget) of incremental 
use of those remote resources, then what you say is entirely true.  But if you 
were paying for traffic, you'd think differently.

When I was in Rome a year ago for a couple weeks, I had one of those USB data 
modems.  You pay by the kilobyte, so you do all your work offline, fire up the 
modem, transfer your stuff, and shut it down. Shades of dial-up and paying by 
the minute of connect time. All of a sudden you get real interested in how much 
bandwidth all that javascript and cool formatting stuff flying back and forth 
to make the pretty website for email is.  And the convenient "let's send 
packets to keep the VPN tunnel alive" feature is really unpleasant, because you 
can literally watch the money meter add up while you're sitting there thinking.

With a cluster of my own, the entire cost is essentially fixed, whether I use 
it or not, so I can "fool around" and not worry about whether I'm being 
efficient.  Which gets back to CS classes in the 70s, where you had a limited 
number of runs/seconds for the quarter, so great emphasis is put on "desk 
checking" as opposed to interactive development... I'm not sure that one didn't 
have higher quality code back then, but the overall productivity was lower, and 
I'd hate to give up Matlab (and I loved APL on an IBM5100).  But then, I'm in 
what is essentially a perpetual prototyping environment. That is, a good part 
of the computational work I need is iterations of the algorithm implementation, 
more than the outputs of that algorithm. 

If I were like my wife who does banking IT or the folks doing science data 
processing from satellites, in a production environment, I'd probably say the 
big data center is a MUCH better way to go.  They need the efficiency, they 
have large enough volume to justify the fine grained accounting (because 1% of 
100 million dollars is a lot bigger in absolute terms than 10% of $100k, so you 
can afford to put a full time person on it for the big job)

So, my wife needs the HPC data center and a staff of minions.   I want the 
personal supercomputer which makes my life incrementally easier, but without 
having to spend time dealing with accounting for tens of dollars.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

RE: [Beowulf] how Google warps your brain

Reply via email to