In general, you can use *'**NGS: Picard (beta) -> SAM to FASTQ'* to
extract sequences (convert BAM > SAM first), but this tool does not add
in extra sequence based off the reference genome (or pad the associated
quality scores, etc.). I don't know of a Galaxy wrapped tool that does
this, but you might check the Tool Shed, or other public Galaxy servers.
Others reading this post may also have advice.
Now, going from *BAM* -> coordinates (bed/interval) *->* *FASTA*
sequence is possible a few ways. The general idea is that the
coordinates are manipulated to extend the mapped footprint and then the
sequence is extracted from the reference genome. Any content novel in
the original sequence is lost, but maybe this still has some utility for
you. The two methods below show how to do this, with the 2nd being
simpler, if the genome is at UCSC. There are other ways to get flanking
sequence, merge/cluster, etc. (see tools in group 'Operate on Genomic
Intervals') but below are the most direct methods per-sequence to simply
And if you need to filter down multi-mapped data, use the tool ' NGS:
SAM Tools -> Filter SAM' (converting to/from SAM from BAM as needed).
*1st method, works for any genome, include a custom reference genome:*
1 - convert 'NGS: SAM Tools ->BAM-to-SAM'
2 - convert SAM to interval with 'NGS: SAM Tools -> Convert SAM' or
convert to bed with 'BEDTools -> Convert from BAM to BED'
3 - split the file into two: one representing the (+) strand alignments,
one the (-) using the tool ' Filter and Sort -> Filter'
4 - adjust the start or end coordinate to extend the alignment footprint
as wanted using the tool 'Text Manipulation -> Compute'. Remember that
for negative stranded coordinates, the "start" is really where the end
of the sequence aligned and "end" is where the start of the sequence
aligned - interval files report coordinates with respect to (+) strand,
smallest -> largest.
5 - cut out the columns to create a standard interval file again,
swapping in the new coordinates. Click on the pencil icon to make
attribute assignment for columns and to assign a reference genome as
needed - this information is required by the next tool.
6 - get the fasta sequence by using the tool 'Fetch Sequences -> Extract
7 - merge all fasta results together with the tool 'Text Manipulation ->
8 - if you need fastq format, you can pad out quality scores and create
that with the tool 'NGS: QC and manipulation -> Combine FASTA and QUAL'
*2nd method, if the reference genome is at UCSC:*
1 - convert 'BEDTools -> Convert from BAM to BED'
2 - click on the "view at UCSC main" link for the dataset
3 - once at UCSC Browser, the data will show up as a custom track, by
default named "User Track" in the top track group. Click on the track
name - it will take you to the track controls and focus the browser on
4 - in the top blue menu bar, click on "Tools -> Table Browser". This
track will now be pre-loaded in the form with all options probably set
as you want them (this user track is selected and "region" is "genome")
- except for one - change "output format" from "BED" to be "sequence
5 - confirm that the "Galaxy" box is checked, and click on "get output"
6 - the next form has options for extending the sequence at 5' and/or 3'
ends, all in one go, adjust as you want
7 - click on "Send query to Galaxy" and the dataset will load back into
the working history
8 - the fasta can be converted to fastq as in the 1st method, step #8
Hopefully some of this is helpful!
On 9/11/13 1:56 AM, Tobias Hohenauer wrote:
I am working on an MNAse-Seq experiment with 50bp single end reads. To
identify nucleosome positions, I read that one needs to extend the
single reads to approximately the length of nucleosome protected DNA,
being approximately 150bp.
Is there a way in Galaxy to extend 50bp reads to 150bp length, lets
say from a .BAM file with mapped reads?
Of course any other comment on this topic is much appreciated!
Thank you very much,
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
To search Galaxy mailing lists use the unified search at: