Re: Nextflow - have just used it on our HPC cluster and liked it

2023-05-08 Thread Pierre Gruet

Hi Steffen,

Le 05/05/2023 à 22:16, Steffen Möller a écrit :

Hello,

I must admit that I am rather impressed - a series of images were 
auto-downloaded to function together with singularity and this then worked on a 
test data collection of the workflow. So, with singularity (or docker) in 
Debian, and nextflow, we would have an immediate sync with what upstream offers.

I had a look at the table in Google docs and found that the one dependency that 
was once missing to get nextflow through its autotests, i.e. capsule-nextflow 
is in the archive now. So I thought to give the main package of nextflow 
another look. I am impressed by all work that Pierre already invested into the 
packaging. @Pierre, where does that work stand? I only saw that kryo5 is listed 
as a dependency that is not in the archive.



It is very nice to hear from you on Nextflow, this is a motivation 
booster for the packaging!


I have indeed packaged many dependencies, and if I remember correctly I 
had two issues at this step:
- kryo 5.x is needed although we have libkryo-java/2.7 in Debian, which 
cannot be upgraded to version 5.x as it would break gradle (!). So we 
need a dedicated src:kryo5 package, I began doing it but took a break 
for whatever reason;
- more annoying: I need groovy 3.x and in Debian we have groovy 2.4.21. 
I tried to package it but met some antlr4 issues on which I have to 
spend more time.


I think reworking on this will be a good task after the release of 
Bookworm. I can try again and post possible issues here.



Best,
Steffen



Best,

--
Pierre


OpenPGP_signature
Description: OpenPGP digital signature


Aw: Re: Nextflow - have just used it on our HPC cluster and liked it

2023-05-08 Thread Steffen Möller



> Gesendet: Montag, 08. Mai 2023 um 08:45 Uhr
> Von: "Charles Plessy" 
> An: debian-med@lists.debian.org
> Betreff: Re: Nextflow - have just used it on our HPC cluster and liked it
>
> Hi Steffen and everybody,
> 
> I also use Nextflow at work, and indeed, it makes it very easy to run a
> pipeline many times.  Also, Nextflow is a single Java executable, which
> makes it easy to deploy anywhere Java is already installed.
> 
> You probably also saw the nf-core repository of modules and pipelines.
> I like the way they are organised and find them empowering.  The
> community is very nice too.
> 
> BUT
> 
> With my Debian background it is very hard for me to adapt to the conda /
> biocona / quay / galaxy ecosystem.  I just can not figure out who is
> responsible for what, no idea how long the whole thing will be
> supported, where is the source code used to build the the packages into
> Docker images in to Singularity images, etc.  Not to mention that the
> whole paradigm behind "one tool, one minimal image" deprives me from all
> the Unix tools that I use to enjoy on in a Debian context.  In bioconda
> you have no idea whether sed is from GNU of from busybox unless you try
> it or dig for a package recipe in GitHub...

I am struggling with conda environments, I must admit. This should be mostly 
analogous to chroot environments, I keep thinking, but still ... 

> 
> NOT TO MENTION THAT
> 
> Everybody expects that these images will stay forever and for free at
> the URL where they are, while I have not seen any evidence of an
> organisation promising that it will really happen for at least a
> decade...  Without these images and the receipes to create them
> (remember the singularity <- docker <- conda <- GitHub fragmentation),
> the hope that these pipelines provide reproducibility in the long term
> is wishful thiking.

I admit to care more about the data than the exact tools. Just rerun it with 
whatever was proven to be superior.
The longevity of https://biocontainers.pro/ will basically determine about what 
we shall expect and conversely our demands will shape what will be offered.

> SO
> 
> Against the stream of minimising image size to the bone while processing
> terabytes of sequencing data, I thing that Debian Med images with all of
> our packages installed would be a useful alternative in many cases.

Yes - for the direct execution but also within images.

> I am already doing something along the lines on our HPC cluster to turn
> our packages into environment modules (lmod).
> 
> https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image
> 
> The size of the images is a bit less than 8 GiB, and I make a new image
> at each point release.  Would there be some interest to make such images
> in a more official way ?

We could have our own Singularity Hub (https://singularityhub.github.io/).

@Olivier, Hervé, Matúš et al. - I would happily hear from you that we just do 
not need anything like that.

Best,
Steffen




Re: Nextflow - have just used it on our HPC cluster and liked it

2023-05-08 Thread Tony Travis

On 08/05/2023 07:45, Charles Plessy wrote:

[...]  In bioconda
you have no idea whether sed is from GNU of from busybox unless you try
it or dig for a package recipe in GitHub...


Hi, Charles.

In fact, an Anaconda/Bioconda 'env' is little more than defined shell 
environment variables and a .yaml recipe to install packages from the 
Anaconda/Bioconda repo's. You can discover the version of e.g. "sed" you 
are using in an active conda 'env' by:


  which sed

If you look at the PATH in an active env you can see why:

  printenv PATH

This also reveals more: An 'env' just overloads the existing environment 
in your Linux shell and, consequently, unless you choose to install a 
different version of a program in your 'env' your PATH still results in 
the system-managed (deb) version of e.g. "sed" being run.


I run a full install of "med-bio" + Bioconda and create env's for odd 
versions of Python, Perl, R and their supporting libraries that are 
required by certain bioinformatics pipelines and that would otherwise 
conflict with the system-managed (deb) versions if installed manually.


For me, this began with QIIME which failed it's validation tests using 
the current, up-to-date, system-managed versions of supporting packages 
when Tim Booth packaged it for Bio-Linux. I used Bioconda to teach a 
course on QIIME, because Tim's Bio-Linux package gave different results 
to running QIIME on a Mac using the same data, which was a serious 
problem for my colleagues who wanted to compare their results.


As both you and Steffen have said many times, one aim of Debian-Med is 
to promote good 'reproducible' research by providing a well-defined 
environment in which to run bioinformatics pipelines. I believe the 
combination of "med-bio" + Bioconda achieves that and I have ceased my 
independent development of Bio-Linux if favour of creating a "bio-linux" 
meta-package within the Debian-Med project with help from Andreas.




I am already doing something along the lines on our HPC cluster to turn
our packages into environment modules (lmod).

https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image

The size of the images is a bit less than 8 GiB, and I make a new image
at each point release.  Would there be some interest to make such images
in a more official way ?


I have to confess my deep ignorance of "singularity", but I am quite 
interested. I created AWS and CyVerse Bio-Linux VM's a while ago and, I 
guess, I should really bring myself up-to-date now with more modern 
approaches for HPC. On that topic, has anyone tried out QLUSTAR since 
Roland Ferrenbacher changed the licence to be 100% open source?



https://qlustar.com/


The real snag, for me, is that I can't be paid to install or support it!

However, anyone can install and use QLUSTAR themselves for free and I 
can support their use of QLUSTAR for bioinformatics. I believe it was a 
promising development when Roland Ferrenbacher agreed to support and 
endorse Debian-Med in QLUSTAR, but I've not seen much interest that 
development on our list despite Roland attending at least two Sprints.


Bye,

  Tony.

--
Minke Informatics Limited, Registered in Scotland - Company No. SC419028
Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)
tel. +44(0)19755 63548http://minke-informatics.co.uk
mob. +44(0)7985 078324mailto:tony.tra...@minke-informatics.co.uk



Re: Nextflow - have just used it on our HPC cluster and liked it

2023-05-08 Thread Charles Plessy
Hi Steffen and everybody,

I also use Nextflow at work, and indeed, it makes it very easy to run a
pipeline many times.  Also, Nextflow is a single Java executable, which
makes it easy to deploy anywhere Java is already installed.

You probably also saw the nf-core repository of modules and pipelines.
I like the way they are organised and find them empowering.  The
community is very nice too.

BUT

With my Debian background it is very hard for me to adapt to the conda /
biocona / quay / galaxy ecosystem.  I just can not figure out who is
responsible for what, no idea how long the whole thing will be
supported, where is the source code used to build the the packages into
Docker images in to Singularity images, etc.  Not to mention that the
whole paradigm behind "one tool, one minimal image" deprives me from all
the Unix tools that I use to enjoy on in a Debian context.  In bioconda
you have no idea whether sed is from GNU of from busybox unless you try
it or dig for a package recipe in GitHub...

NOT TO MENTION THAT

Everybody expects that these images will stay forever and for free at
the URL where they are, while I have not seen any evidence of an
organisation promising that it will really happen for at least a
decade...  Without these images and the receipes to create them
(remember the singularity <- docker <- conda <- GitHub fragmentation),
the hope that these pipelines provide reproducibility in the long term
is wishful thiking.

SO

Against the stream of minimising image size to the bone while processing
terabytes of sequencing data, I thing that Debian Med images with all of
our packages installed would be a useful alternative in many cases.

I am already doing something along the lines on our HPC cluster to turn
our packages into environment modules (lmod).

https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image

The size of the images is a bit less than 8 GiB, and I make a new image
at each point release.  Would there be some interest to make such images
in a more official way ?

Have a nice day,

Charles

-- 
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med
Tooting from home  https://framapiaf.org/@charles_plessy
- You  do not have  my permission  to use  this email  to train  an AI -