Hi Carol,
I don't think this is where the subthread about Conda is heading. Jupyter notbooks is orthogonal to Anaconda. You can definitely have Jupyter without Conda. From a teaching perspective, both Conda and Jupyter notebooks do a fine job. But just as it would be beneficial to warn users about notebook caveats (hidden states and such), it would also be good to do the same for conda caveats (performance).

Cheers,

Maxime




On 2018-08-28 6:29 PM, Carol Willing wrote:
Hi all,

There's positive discussion that has been started by Joel's talk. While I liked 
his talk and there are some good points re: improving support for software 
engineering best practices in Jupyter and JupyterLab notebooks, I'm a bit 
concerned about the direction that this conversation is going.

While all are entitled to their personal opinions and the Carpentries will use 
notebooks when and if needed, I believe that the Carpentries would be doing its 
students a disservice by warning people not to use the notebooks or conda.

The notebooks are a popular and effective tool for scientists and data scientists to have 
in their toolbox. Project Jupyter won the ACM Software System Award recently, and the ACM 
stated "These tools, which include IPython, the Jupyter Notebook and JupyterHub, 
have become a de facto standard for data analysis in research, education, journalism and 
industry." https://awards.acm.org/software-system

While it's great for folks to have different personal perspectives, I want to 
make sure that the Carpentries and its lessons do not recommend that the 
Jupyter Notebooks, IPython, and JupyterHub should be avoided by scientists and 
data scientists.

Thanks,

Carol Willing


On 28 Aug 2018, at 11:38, Maxime Boissonneault 
<[email protected]> wrote:

These kinds of things are rather hard to track in time, because everything is a 
moving target (conda and other package managers constantly get updated, but 
also version of packages changes), but here is a bit more details :

- The 10x performance difference was with a user code, which I unfortunately 
can't share (nor do I still have a copy of it). It was about numpy, which may 
or may not have changed since MKL can now be shipped with Anaconda.

- FFTW, 2x performance gain : These slides compare between Conda-provided (and 
those provided by other package managers) FFTW, and one which was built on an 
avx2 cluster, the performance gain is 2x (see slides 28 and 29 :
https://archive.fosdem.org/2018/schedule/event/installing_software_for_scientists/attachments/slides/2437/export/events/attachments/installing_software_for_scientists/slides/2437/20180204_installing_software_for_scientists.pdf


- Tensorflow, 7x gain for CPU version, slide 28 of this talk : 
https://archive.fosdem.org/2018/schedule/event/how_to_make_package_managers_cry/attachments/slides/2297/export/events/attachments/how_to_make_package_managers_cry/slides/2297/how_to_make_package_managers_cry.pdf

   This one was not comparing Conda itself, but manylinux python wheels 
provided by the Tensorflow team, but no doubt Conda has the same issue if they 
build for generic architectures.



Basically, any package that is compiled in a portable manner, such as what 
Conda and manylinux wheels do, will have some degree of speedup if compiled for 
the target architecture instead. This is typically achieved by the team of 
analysts who manage a cluster.

Cheers,

Maxime


On 2018-08-28 2:20 PM, Ashwin Srinath wrote:
I'm very interested to see these examples? We use and advocate the use
of conda environments and I'm happy to be convinced otherwise.

Thanks,
Ashwin

On Tue, Aug 28, 2018 at 2:17 PM, Maxime Boissonneault
<[email protected]> wrote:
Regarding performance, we have example of code using Anaconda-provided
packages that run 10 times slower than the same code using locally built
packages, optimized for the cluster architectures. That's not *a bit*
slower, that's a lot slower.

Regarding "cheating on your partner", that analogy is not by me, but the
point he is trying to carry is that Anaconda basically replaces any cluster
provided versions, which HPC center people are working hard to optimize.
Recent versions of Anaconda are even worse, by packaging things like
compilers and linkers, creating conflicts with cluster-provided system
libraries and tools, and creating a lot of debugging problems for users and
support people alike.

Regards,

Maxime


On 2018-08-28 12:48 PM, Rémi Rampin wrote:

2018-08-28 12:27 EDT, Maxime Boissonneault
<[email protected]>:
As a side-discussion, I think we should also be wary of using Anaconda,
and tell users not to use it in a cluster environment. For reasons, see
here :
https://twitter.com/mboisso/status/1034476890353020928
Hi Maxime,

All I see in this thread is that "it's like cheating on your partner" (!!!)
and it's "generically optimized software" that might be a bit slower than
locally-built libs (interesting concern when using Python, an interpreted
scripting language (and on the slow side too)).

Could you elaborate on those reasons?

Best
--
Rémi


The Carpentries / discuss / see discussions + participants + delivery
options Permalink
------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/T1505f74d7f6e32f8-Mad4fadc6a6da6de2b5f2aeb9
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Président - Comité de coordination du soutien à la recherche de Calcul Québec
Team lead - Research Support National Team, Compute Canada
Instructeur Software Carpentry
Ph. D. en physique

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/T1505f74d7f6e32f8-M77e71bf94fc82bac35910927
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Président - Comité de coordination du soutien à la recherche de Calcul Québec
Team lead - Research Support National Team, Compute Canada
Instructeur Software Carpentry
Ph. D. en physique


------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/T1505f74d7f6e32f8-Maa170b9124a7aca14bbb63f8
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to