Re: [galaxy-user] extension of read length

Jennifer Jackson Thu, 12 Sep 2013 00:54:56 -0700

Hi Tobias,

In general, you can use *'**NGS: Picard (beta) -> SAM to FASTQ'* toextract sequences (convert BAM > SAM first), but this tool does not addin extra sequence based off the reference genome (or pad the associatedquality scores, etc.). I don't know of a Galaxy wrapped tool that doesthis, but you might check the Tool Shed, or other public Galaxy servers.Others reading this post may also have advice.

Now, going from *BAM* -> coordinates (bed/interval) *->* *FASTA*sequence is possible a few ways. The general idea is that thecoordinates are manipulated to extend the mapped footprint and then thesequence is extracted from the reference genome. Any content novel inthe original sequence is lost, but maybe this still has some utility foryou. The two methods below show how to do this, with the 2nd beingsimpler, if the genome is at UCSC. There are other ways to get flankingsequence, merge/cluster, etc. (see tools in group 'Operate on GenomicIntervals') but below are the most direct methods per-sequence to simplyextend.

And if you need to filter down multi-mapped data, use the tool ' NGS:SAM Tools -> Filter SAM' (converting to/from SAM from BAM as needed).


*1st method, works for any genome, include a custom reference genome:*

1 - convert 'NGS: SAM Tools ->BAM-to-SAM'

2 - convert SAM to interval with 'NGS: SAM Tools -> Convert SAM' orconvert to bed with 'BEDTools -> Convert from BAM to BED'3 - split the file into two: one representing the (+) strand alignments,one the (-) using the tool ' Filter and Sort -> Filter'4 - adjust the start or end coordinate to extend the alignment footprintas wanted using the tool 'Text Manipulation -> Compute'. Remember thatfor negative stranded coordinates, the "start" is really where the endof the sequence aligned and "end" is where the start of the sequencealigned - interval files report coordinates with respect to (+) strand,smallest -> largest.

http://wiki.galaxyproject.org/Learn/Datatypes#Interval

5 - cut out the columns to create a standard interval file again,swapping in the new coordinates. Click on the pencil icon to makeattribute assignment for columns and to assign a reference genome asneeded - this information is required by the next tool.6 - get the fasta sequence by using the tool 'Fetch Sequences -> ExtractGenomic DNA'7 - merge all fasta results together with the tool 'Text Manipulation ->Concatenate datasets'8 - if you need fastq format, you can pad out quality scores and createthat with the tool 'NGS: QC and manipulation -> Combine FASTA and QUAL'



*2nd method, if the reference genome is at UCSC:*

1 - convert 'BEDTools -> Convert from BAM to BED'
2 - click on the "view at UCSC main" link for the dataset

3 - once at UCSC Browser, the data will show up as a custom track, bydefault named "User Track" in the top track group. Click on the trackname - it will take you to the track controls and focus the browser onthis track.4 - in the top blue menu bar, click on "Tools -> Table Browser". Thistrack will now be pre-loaded in the form with all options probably setas you want them (this user track is selected and "region" is "genome")- except for one - change "output format" from "BED" to be "sequence

5 - confirm that the "Galaxy" box is checked, and click on "get output"

6 - the next form has options for extending the sequence at 5' and/or 3'ends, all in one go, adjust as you want7 - click on "Send query to Galaxy" and the dataset will load back intothe working history

8 - the fasta can be converted to fastq as in the 1st method, step #8

Hopefully some of this is helpful!

Jen
Galaxy team


On 9/11/13 1:56 AM, Tobias Hohenauer wrote:

Dear all,
I am working on an MNAse-Seq experiment with 50bp single end reads. Toidentify nucleosome positions, I read that one needs to extend thesingle reads to approximately the length of nucleosome protected DNA,being approximately 150bp.
Is there a way in Galaxy to extend 50bp reads to 150bp length, letssay from a .BAM file with mapped reads?
Of course any other comment on this topic is much appreciated!

Thank you very much,

Tobias


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] extension of read length

Reply via email to