Dear Jun, On Tue, Apr 21, 2020 at 10:47:09PM +0200, Jun Aruga wrote: > Watching COVID19 Virtual BioHackathon 2020 kick-off [1] and wrap-up > [2] videos, I was thinking about this question. > > What are the most key open source projects (or packages) that we need > to care to maintain to fight COVID-19? > > "key" means the packages that we care more about than other packages. > Sorry for the ambiguous question. > I am curious and want to be concious about the priorities. > > Well, it's a broader topic. There are several factors such as > Sequencing, Machine Learning, Graph, Workflow and etc in it. > I shared the 3 nf-core pipelines nf-core/nanoseq, nf-core/artic, > nf-core/viralrecon in the email thread: Subject > https://github.com/nf-core/covid19 - nextflow pipeline . And the > following are the used software in each pipeline.
Thank you for this analysis. > In my option, the packages written in compiling language such as C, > C++ and taking long time to compile are the "key" packages. I personally would not subscribe the distinction based on the technology used to develop the software. I'd rather decide on the pure usage statistics. Sometimes the dependency tree is pretty complex and packages of interpreted languages might get in conflict. IMHO its better to priorise on a) function b) available tests > Because > users can still run the script language software without deb package, > and users can compile software that is easy to compile by themselves. > And the essential function's software is also the key. In this case, > that is sequencing aligner. > > So, the key packages are bowite2, minimap2, bwa in the list of the pipelines. > And simde is used to support the packages on multiple CPU architctures [3]. > > So, the most key packages that we care about to fight COVID-19 are in > order to the priority. > > 1. simde > 2. bowtie2 (build time is long. It's relatively hard to compile it). > 3. minimap2 > 4. bwa I'm lacking the bioinformatics background to decide about this but from usage numbers these packages seem to be frequently used. > That's my observation. > So, do you have any ideas or observations about the question? I would > like to hear. > > Thank you. > > ## Used software in each pipeline. I'm adding comments to the software packages you mentioned: > https://github.com/nf-core/nanoseq/blob/master/bin/scrape_software_versions.py > guppy Missing in Debian. Is it this project https://staff.aist.go.jp/yutaka.ueno/guppy/ ? > qcat > pycoQC Both just uploaded to new (including dependency python3-parasail) > NanoPlot I'll add https://github.com/wdecoster/NanoPlot to our todo list > FastQC In Debian. > GraphMap2 I'll add https://github.com/lbcb-sci/graphmap2 to our todo list > minimap2 > Samtools > BEDTools > MultiQC All four in Debian. > > https://github.com/nf-core/artic/blob/dev/bin/scrape_software_versions.py > FastQC > NanoPlot > BWA > minimap2 > Samtools > BEDTools > MultiQC See above regarding NanoPlot - all others in Debian. > > https://github.com/nf-core/viralrecon/blob/dev/bin/scrape_software_versions.py > parallel-fastq-dump I'll add https://github.com/rvalieris/parallel-fastq-dump to our todo list > FastQC > fastp > Bowtie 2 > Samtools > BEDTools > Picard All in Debian. > iVar I'll add https://github.com/andersen-lab/ivar to our todo list > VarScan 2 In Debian non-free. Its on our software liberation Wiki https://wiki.debian.org/DebianMed/SoftwareLiberation It would be a *huge* service to the community to convince upstream about free license > SnpEff Ahhhh, that one rings a bell. Its hard since several not yet packaged predepends. I've spent hours on it before but I'll add this to our todo list https://salsa.debian.org/med-team/snpeff > SnpSift Same source as SnpEff (see above) > BCFTools > Cutadapt > Kraken2 > SPAdes > Unicycler > minia > Minimap2 > vg > BLAST > ABACAS All in Debian. > QUAST Thats a pretty complex assembly of third party software (for instance including their own copy of bwa, minimap2, bedtools and lots of others). For instance it was my motivation to package sambamba which on its own is quite a complex packaging project (beeing RC buggy half of the time of its existance :-(). It also includes genemark a binary since it is non-free - see again our software liberation page https://wiki.debian.org/DebianMed/SoftwareLiberation -> I'd like to repeat that freeing this would be very sensible. In short packaging quast is pretty tough - but there is at least a weak (not building yet!) start: https://salsa.debian.org/med-team/quast > R Well R in itself is cheap - if some specific R packages are used and we might not have packaged these this should be easily doable. > MultiQC In Debian. In general your list of software is extremely helpful. Thanks a lot for it. I've added it to the covid-19 task[4] (which will be re-rendered soon). The said todo list were I've added the projects is in the COVID-19 coordination wiki[5] As always: Everybody is kindly invited to pick from the todo list. Please do not underestimate the todo items contacting authors to free their code. Every little contribution here is *extremely* helpful and highly appreciated. Thanks again Jun for your very helpful contribution Andreas. > [1] https://youtu.be/x-QTP5Z_WIU > [2] https://youtu.be/g5cQk8jIMwo > [3] https://wiki.debian.org/SIMDEverywhere [4] https://blends.debian.org/med/tasks/covid-19 [5] https://salsa.debian.org/med-team/community/2020-covid19-hackathon/-/wikis/COVID-19-Hackathon-packages-needing-work -- http://fam-tille.de

