Maxime, Thanks for continuing to engage on this topic as it gets to one of my primary motivations for developing this lesson. Some background, at KAUST we have three university managed clusters: two heterogeneous commodity clusters and one Cray XC 40 HPC. These clusters are all managed using traditional HPC tools (i.e., module system, etc) and our systems team has also provided a comprehensive list of pre-compiled python wheels optimized for the various compute nodes. Conda is also already available on all three of our clusters via the module system. This traditional HPC setup does a good job of supporting traditional HPC users and use cases. Our traditional HPC users are very happy at present.
However, the majority of our users are not traditional HPC users; they have data science and machine learning workflows that are not well supported by traditional HPC managed system. While these users are able to do the majority of their work on their laptops/workstations, they would benefit from being able to scale their workflows on our cluster. * In order for these non-traditional HPC users to use our HPC managed clusters they must basically maintain two workflows and software environments: one for their laptops/workstation and one for our clusters. * I put high value on maintaining the portability of my data science and machine learning workflows: I want to be able to develop on my laptop/workstation and then move my workflow to our clusters or the public cloud only when necessary. * Many (most?) of our users will never encounter traditional HPC managed hardware after they leave KAUST; as such I would prefer to train our users on tools that will set them up for success in public (or similarly managed private) cloud environments. I view Conda, eventually combined with Docker (for which I am also developed a lesson), as at least a partial solution to these concerns. Instead of a section on "when not to use Conda" I am thinking of crafting an episode on "when and how to build bespoke conda packages." Although do think that I should make it clear that if you have a traditional HPC workflow, then you should "consult your local experts" as conda might not be the best solution. Anyway, hopefully this provides a bit of context and motivation for some of the lesson design choices that I have made thus far. Thanks again for the feedback it is much appreciated! D On Thu, Jun 13, 2019 at 3:48 PM Maxime Boissonneault < [email protected]> wrote: > On 2019-06-12 10:56 AM, Michael Sarahan wrote: > > That's a good point, but rather than say "don't use conda at all" - that's > more reason to have custom channels where conda is set up to comply with > those needs. Conda need not be mutually exclusive with these things, but > it does take some setup to get them working together. > > That's not what I'm saying. What I'm saying is consult with your local > experts. > > On our clusters, *we* tell users don't use conda. > > We provide a comprehensive list of precompiled python wheels. There is > absolutely no need for conda in 99% of the cases. > > I don't see why we would support custom conda channels when we can just as > well support python wheels that don't require conda. > > > Maxime > > > > Saying "don't use conda at all" is ignoring the work that has to happen > either way. Either you have to reproduce what conda is providing somehow, > or you have to make conda use the part on the system side. That's > definitely a case-by-case scenario for everyone, and we need to document > both paths. > > For your example of MPI, conda packages are setup to explicitly require > some MPI implementation where necessary. That package can come from an > actual conda MPICH package, or it can come from a known binary compatible > system installation that has a conda package setup to reference it. Conda > is not dogmatic about being hermetic (unlike, say, bazel). Binary > compatibility with external libraries can be pretty tricky, though. > > On Wed, Jun 12, 2019 at 9:48 AM Maxime Boissonneault < > [email protected]> wrote: > >> Hi, >> How about including a part about when *not* to use Conda ? >> >> In particular, if they are going to be computing on a supercomputer, they >> should consult with your cluster specialists first. >> Conda works well on somebody's desktop, but it creates a lot of problems >> on supercomputers, because it does crazy stuff like installing MPI by >> itself instead of relying on staff-installed modules and software packages. >> >> Cheers, >> >> Maxime >> >> >> On 2019-06-12 9:49 AM, David Pugh wrote: >> >> All, >> >> I have developed a Software Carpentry style lesson for Conda and would be >> keen to get feedback from the community! >> >> Website: >> >> https://kaust-vislab.github.io/introduction-to-conda-for-data-scientists/ >> >> Repo: >> >> https://github.com/kaust-vislab/introduction-to-conda-for-data-scientists >> >> Thanks and look forward to hearing from you! >> >> David >> >> >> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss / > see discussions <https://carpentries.topicbox.com/groups/discuss> + > participants <https://carpentries.topicbox.com/groups/discuss/members> + > delivery > options <https://carpentries.topicbox.com/groups/discuss/subscription> > Permalink > <https://carpentries.topicbox.com/groups/discuss/Tb12fc97e5ee621f2-M1342d710fc4e431e3998ff41> > > > -- > --------------------------------- > Maxime Boissonneault > Analyste de calcul - Calcul Québec, Université Laval > Président - Comité de coordination du soutien à la recherche de Calcul Québec > Team lead - Research Support National Team, Compute Canada > Instructeur Software Carpentry > Ph. D. en physique > > ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/T32ac38bed70dbd29-Ma06c2f37eeef6a2257b1cfcc Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
