Re: [discuss] Software Carpentry style lesson for Conda

David Pugh Thu, 13 Jun 2019 06:31:48 -0700

Maxime,

Thanks for continuing to engage on this topic as it gets to one of my
primary motivations for developing this lesson. Some background, at KAUST
we have three university managed clusters: two heterogeneous commodity
clusters and one Cray XC 40 HPC.  These clusters are all managed using
traditional HPC tools (i.e., module system, etc) and our systems team has
also provided a comprehensive list of pre-compiled python wheels optimized
for the various compute nodes.  Conda is also already available on all
three of our clusters via the module system.  This traditional HPC setup
does a good job of supporting traditional HPC users and use cases. Our
traditional HPC users are very happy at present.

However, the majority of our users are not traditional HPC users; they have
data science and machine learning workflows that are not well supported by
traditional HPC managed system.  While these users are able to do the
majority of their work on their laptops/workstations, they would benefit
from being able to scale their workflows on our cluster.

* In order for these non-traditional HPC users to use our HPC managed
clusters they must basically maintain two workflows and software
environments: one for their laptops/workstation and one for our clusters.
* I put high value on maintaining the portability of my data science and
machine learning workflows: I want to be able to develop on my
laptop/workstation and then move my workflow to our clusters  or the public
cloud only when necessary.
* Many (most?) of our users will never encounter traditional HPC managed
hardware after they leave KAUST; as such I would prefer to train our users
on tools that will set them up for success in public (or similarly managed
private) cloud environments.

I view Conda, eventually combined with Docker (for which I am also
developed a lesson), as at least a partial solution to these concerns.

Instead of a section on "when not to use Conda" I am thinking of crafting
an episode on "when and how to build bespoke conda packages." Although do
think that I should make it clear that if you have a traditional HPC
workflow, then you should "consult your local experts" as conda might not
be the best solution.

Anyway, hopefully this provides a bit of context and motivation for some of
the lesson design choices that I have made thus far.

Thanks again for the feedback it is much appreciated!

D

On Thu, Jun 13, 2019 at 3:48 PM Maxime Boissonneault <
[email protected]> wrote:

> On 2019-06-12 10:56 AM, Michael Sarahan wrote:
>
> That's a good point, but rather than say "don't use conda at all" - that's
> more reason to have custom channels where conda is set up to comply with
> those needs.  Conda need not be mutually exclusive with these things, but
> it does take some setup to get them working together.
>
> That's not what I'm saying. What I'm saying is consult with your local
> experts.
>
> On our clusters, *we* tell users don't use conda.
>
> We provide a comprehensive list of precompiled python wheels. There is
> absolutely no need for conda in 99% of the cases.
>
> I don't see why we would support custom conda channels when we can just as
> well support python wheels that don't require conda.
>
>
> Maxime
>
>
>
> Saying "don't use conda at all" is ignoring the work that has to happen
> either way.  Either you have to reproduce what conda is providing somehow,
> or you have to make conda use the part on the system side.  That's
> definitely a case-by-case scenario for everyone, and we need to document
> both paths.
>
> For your example of MPI, conda packages are setup to explicitly require
> some MPI implementation where necessary.  That package can come from an
> actual conda MPICH package, or it can come from a known binary compatible
> system installation that has a conda package setup to reference it.  Conda
> is not dogmatic about being hermetic (unlike, say, bazel).  Binary
> compatibility with external libraries can be pretty tricky, though.
>
> On Wed, Jun 12, 2019 at 9:48 AM Maxime Boissonneault <
> [email protected]> wrote:
>
>> Hi,
>> How about including a part about when *not* to use Conda ?
>>
>> In particular, if they are going to be computing on a supercomputer, they
>> should consult with your cluster specialists first.
>> Conda works well on somebody's desktop, but it creates a lot of problems
>> on supercomputers, because it does crazy stuff like installing MPI by
>> itself instead of relying on staff-installed modules and software packages.
>>
>> Cheers,
>>
>> Maxime
>>
>>
>> On 2019-06-12 9:49 AM, David Pugh wrote:
>>
>> All,
>>
>> I have developed a Software Carpentry style lesson for Conda and would be
>> keen to get feedback from the community!
>>
>> Website:
>>
>> https://kaust-vislab.github.io/introduction-to-conda-for-data-scientists/
>>
>> Repo:
>>
>> https://github.com/kaust-vislab/introduction-to-conda-for-data-scientists
>>
>> Thanks and look forward to hearing from you!
>>
>> David
>>
>>
>> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss /
> see discussions <https://carpentries.topicbox.com/groups/discuss> +
> participants <https://carpentries.topicbox.com/groups/discuss/members> + 
> delivery
> options <https://carpentries.topicbox.com/groups/discuss/subscription>
> Permalink
> <https://carpentries.topicbox.com/groups/discuss/Tb12fc97e5ee621f2-M1342d710fc4e431e3998ff41>
>
>
> --
> ---------------------------------
> Maxime Boissonneault
> Analyste de calcul - Calcul Québec, Université Laval
> Président - Comité de coordination du soutien à la recherche de Calcul Québec
> Team lead - Research Support National Team, Compute Canada
> Instructeur Software Carpentry
> Ph. D. en physique
>
>

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/T32ac38bed70dbd29-Ma06c2f37eeef6a2257b1cfcc
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Re: [discuss] Software Carpentry style lesson for Conda

Reply via email to