Re: Nextflow - have just used it on our HPC cluster and liked it
Hi Steffen, Le 05/05/2023 à 22:16, Steffen Möller a écrit : Hello, I must admit that I am rather impressed - a series of images were auto-downloaded to function together with singularity and this then worked on a test data collection of the workflow. So, with singularity (or docker) in Debian, and nextflow, we would have an immediate sync with what upstream offers. I had a look at the table in Google docs and found that the one dependency that was once missing to get nextflow through its autotests, i.e. capsule-nextflow is in the archive now. So I thought to give the main package of nextflow another look. I am impressed by all work that Pierre already invested into the packaging. @Pierre, where does that work stand? I only saw that kryo5 is listed as a dependency that is not in the archive. It is very nice to hear from you on Nextflow, this is a motivation booster for the packaging! I have indeed packaged many dependencies, and if I remember correctly I had two issues at this step: - kryo 5.x is needed although we have libkryo-java/2.7 in Debian, which cannot be upgraded to version 5.x as it would break gradle (!). So we need a dedicated src:kryo5 package, I began doing it but took a break for whatever reason; - more annoying: I need groovy 3.x and in Debian we have groovy 2.4.21. I tried to package it but met some antlr4 issues on which I have to spend more time. I think reworking on this will be a good task after the release of Bookworm. I can try again and post possible issues here. Best, Steffen Best, -- Pierre OpenPGP_signature Description: OpenPGP digital signature
Aw: Re: Nextflow - have just used it on our HPC cluster and liked it
> Gesendet: Montag, 08. Mai 2023 um 08:45 Uhr > Von: "Charles Plessy" > An: debian-med@lists.debian.org > Betreff: Re: Nextflow - have just used it on our HPC cluster and liked it > > Hi Steffen and everybody, > > I also use Nextflow at work, and indeed, it makes it very easy to run a > pipeline many times. Also, Nextflow is a single Java executable, which > makes it easy to deploy anywhere Java is already installed. > > You probably also saw the nf-core repository of modules and pipelines. > I like the way they are organised and find them empowering. The > community is very nice too. > > BUT > > With my Debian background it is very hard for me to adapt to the conda / > biocona / quay / galaxy ecosystem. I just can not figure out who is > responsible for what, no idea how long the whole thing will be > supported, where is the source code used to build the the packages into > Docker images in to Singularity images, etc. Not to mention that the > whole paradigm behind "one tool, one minimal image" deprives me from all > the Unix tools that I use to enjoy on in a Debian context. In bioconda > you have no idea whether sed is from GNU of from busybox unless you try > it or dig for a package recipe in GitHub... I am struggling with conda environments, I must admit. This should be mostly analogous to chroot environments, I keep thinking, but still ... > > NOT TO MENTION THAT > > Everybody expects that these images will stay forever and for free at > the URL where they are, while I have not seen any evidence of an > organisation promising that it will really happen for at least a > decade... Without these images and the receipes to create them > (remember the singularity <- docker <- conda <- GitHub fragmentation), > the hope that these pipelines provide reproducibility in the long term > is wishful thiking. I admit to care more about the data than the exact tools. Just rerun it with whatever was proven to be superior. The longevity of https://biocontainers.pro/ will basically determine about what we shall expect and conversely our demands will shape what will be offered. > SO > > Against the stream of minimising image size to the bone while processing > terabytes of sequencing data, I thing that Debian Med images with all of > our packages installed would be a useful alternative in many cases. Yes - for the direct execution but also within images. > I am already doing something along the lines on our HPC cluster to turn > our packages into environment modules (lmod). > > https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image > > The size of the images is a bit less than 8 GiB, and I make a new image > at each point release. Would there be some interest to make such images > in a more official way ? We could have our own Singularity Hub (https://singularityhub.github.io/). @Olivier, Hervé, Matúš et al. - I would happily hear from you that we just do not need anything like that. Best, Steffen
Re: Nextflow - have just used it on our HPC cluster and liked it
On 08/05/2023 07:45, Charles Plessy wrote: [...] In bioconda you have no idea whether sed is from GNU of from busybox unless you try it or dig for a package recipe in GitHub... Hi, Charles. In fact, an Anaconda/Bioconda 'env' is little more than defined shell environment variables and a .yaml recipe to install packages from the Anaconda/Bioconda repo's. You can discover the version of e.g. "sed" you are using in an active conda 'env' by: which sed If you look at the PATH in an active env you can see why: printenv PATH This also reveals more: An 'env' just overloads the existing environment in your Linux shell and, consequently, unless you choose to install a different version of a program in your 'env' your PATH still results in the system-managed (deb) version of e.g. "sed" being run. I run a full install of "med-bio" + Bioconda and create env's for odd versions of Python, Perl, R and their supporting libraries that are required by certain bioinformatics pipelines and that would otherwise conflict with the system-managed (deb) versions if installed manually. For me, this began with QIIME which failed it's validation tests using the current, up-to-date, system-managed versions of supporting packages when Tim Booth packaged it for Bio-Linux. I used Bioconda to teach a course on QIIME, because Tim's Bio-Linux package gave different results to running QIIME on a Mac using the same data, which was a serious problem for my colleagues who wanted to compare their results. As both you and Steffen have said many times, one aim of Debian-Med is to promote good 'reproducible' research by providing a well-defined environment in which to run bioinformatics pipelines. I believe the combination of "med-bio" + Bioconda achieves that and I have ceased my independent development of Bio-Linux if favour of creating a "bio-linux" meta-package within the Debian-Med project with help from Andreas. I am already doing something along the lines on our HPC cluster to turn our packages into environment modules (lmod). https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image The size of the images is a bit less than 8 GiB, and I make a new image at each point release. Would there be some interest to make such images in a more official way ? I have to confess my deep ignorance of "singularity", but I am quite interested. I created AWS and CyVerse Bio-Linux VM's a while ago and, I guess, I should really bring myself up-to-date now with more modern approaches for HPC. On that topic, has anyone tried out QLUSTAR since Roland Ferrenbacher changed the licence to be 100% open source? https://qlustar.com/ The real snag, for me, is that I can't be paid to install or support it! However, anyone can install and use QLUSTAR themselves for free and I can support their use of QLUSTAR for bioinformatics. I believe it was a promising development when Roland Ferrenbacher agreed to support and endorse Debian-Med in QLUSTAR, but I've not seen much interest that development on our list despite Roland attending at least two Sprints. Bye, Tony. -- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548http://minke-informatics.co.uk mob. +44(0)7985 078324mailto:tony.tra...@minke-informatics.co.uk
Re: Nextflow - have just used it on our HPC cluster and liked it
Hi Steffen and everybody, I also use Nextflow at work, and indeed, it makes it very easy to run a pipeline many times. Also, Nextflow is a single Java executable, which makes it easy to deploy anywhere Java is already installed. You probably also saw the nf-core repository of modules and pipelines. I like the way they are organised and find them empowering. The community is very nice too. BUT With my Debian background it is very hard for me to adapt to the conda / biocona / quay / galaxy ecosystem. I just can not figure out who is responsible for what, no idea how long the whole thing will be supported, where is the source code used to build the the packages into Docker images in to Singularity images, etc. Not to mention that the whole paradigm behind "one tool, one minimal image" deprives me from all the Unix tools that I use to enjoy on in a Debian context. In bioconda you have no idea whether sed is from GNU of from busybox unless you try it or dig for a package recipe in GitHub... NOT TO MENTION THAT Everybody expects that these images will stay forever and for free at the URL where they are, while I have not seen any evidence of an organisation promising that it will really happen for at least a decade... Without these images and the receipes to create them (remember the singularity <- docker <- conda <- GitHub fragmentation), the hope that these pipelines provide reproducibility in the long term is wishful thiking. SO Against the stream of minimising image size to the bone while processing terabytes of sequencing data, I thing that Debian Med images with all of our packages installed would be a useful alternative in many cases. I am already doing something along the lines on our HPC cluster to turn our packages into environment modules (lmod). https://github.com/oist/BioinfoUgrp/blob/master/DebianMedModules.md#creation-of-a-new-singularity-image The size of the images is a bit less than 8 GiB, and I make a new image at each point release. Would there be some interest to make such images in a more official way ? Have a nice day, Charles -- Charles Plessy Nagahama, Yomitan, Okinawa, Japan Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from home https://framapiaf.org/@charles_plessy - You do not have my permission to use this email to train an AI -