Hi all, I'm looking for a little advice on the pre-existing SAM/BAM filtering tools already in the Galaxy Tool Shed (to avoid reinventing the wheel).
As I mentioned on another thread, I'm working on a wrapper for the "samtools bam2fq" command (targeting samtools 1.1 which fixed some bugs in this tool and added new functionality compared to samtools 0.1.19), see: https://github.com/peterjc/pico_galaxy/tree/master/tools/samtools_bam2fq https://toolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq One of my motivating use cases is a workflow like this: 1. Upload paired end FASTQ files. 2. Map them against a known contaminant genome giving a BAM file (note I need the mapper to report unmapped reads in the output). 3. Filter the BAM to get unmapped reads, plus reads whose partner is unmapped (conversely, remove reads where both partners are mapped). 4. Convert the filtered BAM back into FASTQ (with samtools bam2fq). 5. Proceed with analysis (e.g. de novo assembly). Assuming I have understood "samtools view", this filtering step has to be multiple parts: This would get the unmapped reads $ samtools view -f 0x4 ... This would get reads with an unmapped partner: $ samtools view -f 0x8 ... However this would only get unmapped reads with an unmapped partner: $ samtools view -f 0x12 ... i.e. samtools view allows logical AND, not logical OR, when combining flag filters. So, I believe using samtools directly, a two stage filter is needed followed by a merge (and sort), taking care not to duplicate reads, perhaps: $ samtools view -f 4 ... > unmapped.bam $ samtools view -f 8 -F 4 ... > mapped_with_partner_unmapped.bam $ samtools merge unmapped.bam mapped_with_partner_unmapped.bam > ... That could be repeated within Galaxy but is surprisingly complicated with multiple steps in the history - so I do not want to go that route. Have I overlooked a simple ToolShed solution using samtools? As far as I could tell, the only other option on the current Tool Shed is the Sambamba Filter tool (using "unmapped or mate_is_unmapped"), which has a very capable looking filter system: https://toolshed.g2.bx.psu.edu/view/lomereiter/sambamba_filter @Artem - have you explored updating your tool_dependencies.xml to download your pre-compiled binaries by default? That would make deployment far easier, since D compilers are still rare, and would mean we can see the test results on the Tool Shed :) Please ask if you'd like advice on Tool Shed packaging. Thanks, Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/