Re: [Beowulf] [External] Re: Rant on why HPC isn't as easy as I'd like it to be. [EXT]

Prentice Bisbal via Beowulf Mon, 27 Sep 2021 08:11:59 -0700

I'd be interested

Prentice


On 9/23/21 10:37 AM, Pizarro, Angel via Beowulf wrote:

DISCLOSURE: I work for AWS HPC Developer Relations in the servicesteam. We developer AWS Batch, AWS ParallelCluster, NICE DCV, etc.

Lambda’s limits today are 128MB to 10,240MB (~10GB) and billed in 1MBper ms increments. 15 minute max runtime for the function invocation.

Would you all be interested in a hands-on self-paced workshop oncreating (or porting) an application to serverless environment? E.g.Monte-Carlo simulation, a genome alignment or variant call, or someother problem? We have some basic data processing documentation butnothing that speaks to real-world HPC use case and that is a somethingI want to fill the gap on if folks are interested in it.

Dr. Denis Bauer at CSIRO is also doing interesting things withserverless.


-angel

--

Angel Pizarro | Principal Developer Advocate, HPC @ AWS

*From: *Beowulf <beowulf-boun...@beowulf.org> on behalf of Guy Coates<guy.coa...@gmail.com>

*Date: *Thursday, September 23, 2021 at 8:46 AM
*To: *Tim Cutts <t...@sanger.ac.uk>
*Cc: *Beowulf <beowulf@beowulf.org>

*Subject: *RE: [EXTERNAL] [Beowulf] Rant on why HPC isn't as easy asI'd like it to be. [EXT]

*CAUTION*: This email originated from outside of the organization. Donot click links or open attachments unless you can confirm the senderand know the content is safe.

Out of interest, how large are the compute jobs (memory, runtimeetc)? How easy to get them to fit into a serverless environment?


Thanks,


Guy

On Tue, 21 Sept 2021 at 13:02, Tim Cutts <t...@sanger.ac.uk<mailto:t...@sanger.ac.uk>> wrote:


    I think that’s exactly the situation we’ve been in for a long
    time, especially in life sciences, and it’s becoming more
    entrenched.  My experience is that the average user of our
    scientific computing systems has been becoming less technically
    savvy for many years now.

    The presence of the cloud makes that more acute, in particular
    because it makes it easy for the user to effectively throw more
    hardware at the problem, which reduces the incentive to make their
    code particularly fast or efficient.  Cost is the only brake on
    it, and in many cases I’m finding the PI doesn’t actually care
    about that.  They care that a result is being obtained (and it’s
    time to first result they care about, not time to complete all the
    analysis), and so they typically don’t have much time for those of
    us who are telling them they need to invest in time up front
    developing and optimising efficient code.

    And cost is not necessarily the brake I thought it was going to be
    anyway.  One recent project we’ve done on AWS has impressed me a
    great deal.  It’s not terribly CPU efficient, and would doubtless,
    with sufficient effort, run much more efficiently on premise.  But
    it’s extremely elastic in its nature, and so a good fit for the
    cloud.   Once a week, the project has to completely re-analyse the
    600,000+ COVID genomes we’e sequenced so far, looking for new
    branches in the phylogenetic tree, and to complete that analysis
    inside 8 hours.   Initial attempts to naively convert the HPC
    implementation to run on AWS looked as though they were going to
    be very expensive (~$50k per weekly run).  But a fundamental
    reworking of the entire workflow to make it as cloud native as
    possible, by which I mean almost exclusively serverless, has
    succeeded beyond what I expected.  The total cost is <$5,000 a
    month, and because there is essentially no statically configured
    infrastructure at all, the security is fairly easy to be
    comfortable about. And all of that was done with no detailed
    thinking about whether the actual algorithms running in the
    containers are at all optimised in a traditional HPC sense.  It’s
    just not needed for this particular piece of work.  Did it need
    software developers with hardcore knowledge of performance
    optimisation? No.  Was it rapid to develop and deploy?  Yes.  Is
    the performance fast enough for UK national COVID variant
    surveillance?  Yes.  Is it cost effective? Yes.  Sold!  The one
    thing it did need was knowledgeable cloud architects, but the
    cloud providers can and do help with that.

    Tim

--

    Tim Cutts
    Head of Scientific Computing
    Wellcome Sanger Institute



        On 21 Sep 2021, at 12:24, John Hearns <hear...@gmail.com
        <mailto:hear...@gmail.com>> wrote:

        Some points well made here. I have seen in the past job
        scripts passed on from graduate student to graduate student -
        the case I am thinking on was an Abaqus script for 8 core
        systems, being run on a new 32 core system. Why WOULD a
        graduate student question a script given to them - which
        works. They should be getting on with their science. I guess
        this is where Research Software Engineers come in.

        Another point I would make is about modern processor
        architectures, for instance AMD Rome/Milan. You can have
        different Numa Per Socket options, which affect performance.
        We set the preferred IO path - which I have seen myself to
        have an effect on latency of MPI messages. IF you are not
        concerned about your hardware layout you would just go ahead
        and run, missing  a lot of performance.

        I am now going to be controversial and common that over in
        Julia land the pattern seems to be these days people develop
        on their own laptops, or maybe local GPU systems. There is a
        lot of microbenchmarking going on. But there seems to be not a
        lot of thought given to CPU pinning or shat happens with
        hyperthreading. I guess topics like that are part of HPC
        'Black Magic' - though I would imagine the low latency crowd
        are hot on them.

        I often introduce people to the excellent lstopo/hwloc
        utilities which show the layout of a system. Most people are
        pleasantly surprised to find this.

    -- The Wellcome Sanger Institute is operated by Genome Research
    Limited, a charity registered in England with number 1021457 and a
    company registered in England with number 2742969, whose
    registered office is 215 Euston Road, London, NW1 2BE.

    _______________________________________________
    Beowulf mailing list, Beowulf@beowulf.org
    <mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
    To change your subscription (digest mode or unsubscribe) visit
    https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
    <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>


--

Dr. Guy Coates
+44(0)7801 710224


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] Re: Rant on why HPC isn't as easy as I'd like it to be. [EXT]

Reply via email to