Re: [galaxy-user] Extract sequences from [gtf file] + [genome FASTA file]

Karen Tang Fri, 28 Jan 2011 11:10:26 -0800

I was thinking of something different. Here is a example ofa three-exon transcript, in gtf format:

contig00035 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1";contig00035 Cufflinks exon 3 10 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "1";contig00035 Cufflinks exon 13 18 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "2";

contig00035     Cufflinks       exon    20      22      1000    +       .       gene_id 
"CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "3";


and the genome sequence that the transcript comes from is:

contig00035

GTAGCGTCTCCGACGCGGATATGACCGCACGCTGATGCTCCCAGGGATGAGAGGCGTGCG

I want the sequence for this transcript: I want to extractfrom the genome sequence the subsequences for positions3-10, 13-18, and 20-22, and then concatenate the threesubsequences to create the transcript sequence.

In this case, it would be AGCGTCTC + ACGCGG + TAT, meaningthe transcript sequence would be AGCGTCTCACGCGGTAT.


Is it possible to do this in Galaxy?

Karen :)

On Thu, 27 Jan 2011, Jennifer Jackson wrote:

Hello Karen,
The following general workflow should help you to pull sequences from anysource.
1) cut out the sequence IDs from the query (in this case, a GTF & BED file)and sort them.
Text Manipulation -> Cut columns from a table
Filter and Sort -> Sort
2) convert the target fasta file to tabular format
Convert Formats ->  FASTA-to-Tabular converter
3) join the two datasets based on the sequence ID
Join, Subtract and Group -> Join two Queries
4) covert to fasta
Convert Formats -> Tabular-to-FASTA
5) when starting with a GTF file, there will most likely be duplicates. Toremove, use:
NGS: QC and manipulation -> Collapse sequences
Once you create the actual workflow that performs the job, be sure to save itso that you can just re-use it whenever you need to perform the same task. Todo this, from the history pane (most right) use Options -> Extract workflowand following the instructions on the form to customize.
Hopefully this helps,

Jen
Galaxy team

On 1/26/11 12:05 PM, Karen Tang wrote:
Hi Galaxy people,

I have transcripts predicted by Cufflinks that are in a gtf file. How
can I extract the sequences corresponding to those transcripts, using
Galaxy?

[Cufflinks transcript predictions in gtf file] + [Genome sequence in
FASTA file] ---> [FASTA file of transcript sequences]

My genome is a custom genome (not at UCSC).

---------

I'll also need to do the same thing, except my predicted transcripts are
in a Scripture bed file.

Thanks for your help!

Karen Tang :)
Plant Biology
University of Minnesota

_______________________________________________
galaxy-user mailing list
[email protected]
http://lists.bx.psu.edu/listinfo/galaxy-user

_______________________________________________
galaxy-user mailing list
[email protected]
http://lists.bx.psu.edu/listinfo/galaxy-user

Re: [galaxy-user] Extract sequences from [gtf file] + [genome FASTA file]

Reply via email to