This explanations is very clear. Thank you – I was wondering about some of 
these issues as well.

It would be wonderful if Galaxy could somehow make it possible to provide a 
"bed" file for the –G option or make it feasible to use GTF/bed output from the 
Table Browser tool as input to the –G option. (Maybe it already does?)

GTF is an awkward format and BED would work just as well, if not better.

Best wishes,

Ann Loraine

-------------------------------
Ann Loraine, Ph.D.
Associate Professor
Department of Bioinformatics and Genomics
University of North Carolina at Charlotte
North Carolina Research Campus
600 Laureate Way
Kannapolis, NC 28081
704-250-5750
alora...@uncc.edu
http://www.transvar.org
http://www.bioviz.org
http://www.uncc.edu


From: Jennifer Jackson <j...@bx.psu.edu<mailto:j...@bx.psu.edu>>
Date: Wed, 13 Jun 2012 16:12:27 -0700
To: Kristen Roop <kristen.r...@gmail.com<mailto:kristen.r...@gmail.com>>
Cc: <galaxy-u...@bx.psu.edu<mailto:galaxy-u...@bx.psu.edu>>
Subject: Re: [galaxy-user] Galaxy Reference Genome

Hello Kristen,

Our RNA-seq tutorial and FAQ can help out with the general workflow:

https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq

And an iGenomes reference annotation GTF dataset for mm9 is in the Shared 
Libraries here:
(Import " genes.gtf" to your history, please ignore other content as it is 
under revision)

http://usegalaxy.org  -> Shared Data -> Data Libraries  -> iGenomes -> mm9


To address your questions, one key misunderstanding may be the difference 
between a "reference genome" and a "reference annotation" dataset.

*  "reference genome" = genomic sequence (sourced in .fasta format) that the 
data is mapped against with TopHat and used as a scaffold for the RNA-seq 
tools. Since you are using mm9, selecting the "built-in index" for mm9 is an 
appropriate choice. A reference genome does not provide annotation beyond 
genomic positional coordinates. When using a mapping tool, including TopHat, 
there are mapping parameters that can be set to specify whether to keep only 
the best or all hits - it sounds as if you need to adjust these parameters in 
your run. The filter you ran (question #2) may have removed most or all hits - 
check the output from the SAM filter, was the output greatly reduced or empty? 
If so, re-run TopHat with parameters that keep the best hit from the start and 
move to Cufflinks from there without filtering through SAMTools. Help is on the 
tool form itself and in the links to the manual.

* "reference annotation" = known transcripts (sourced in .gtf or .gff3 format) 
that are also mapped against the reference genome. These transcript annotations 
are the most useful when they contain gene, transcript start site, and other 
key attributes that the Cuff* tools can interpret. This annotation can guide 
assembly at various levels (loose or strict) depending on how the tool 
parameters are configured. The annotation MUST be mapped to the same exact 
reference genome that your FASTQ datasets are mapped to, with the same exact 
chromosome naming (see the RNA-seq FAQ for details). Help is also on the Cuff* 
tools including links to the manuals.

More help, including links to tool help is on our wiki here:
(see ' Tools on the Main server: Example: unexpected results with RNA-seq 
analysis tools.)
http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results

Hopefully this helps,

Jen
Galaxy team

On 6/13/12 7:07 AM, Kristen Roop wrote:
Hello,

Galaxy Main

1.) I am having trouble adding annotations to my Tophat and Cufflinks tools.
I used the Mus.Musculus 9MM reference using the built in index. For the Tophat 
mapping but no annotations were available in the output files.
I then tried converting the the Ref Genome from the UCSC to a SAM file using 
Sam Tools. Tophat would not recognize this but Cufflinks did. The Cufflinks 
output file did not have the annotation either.

Any thoughts on the proper way to add annotations?



2.) I am also trying to filter the single mapped reads from the multiple mapped 
reads that resulted from Tophat. After converting the output file from Tophat I 
used the filter tool in the Sam Tools choosing 0x100 map is not primary. 
Afterwards I tried to run Cufflinks on the filtered output only to have it fail.


My ultimate goal is to look at RNA seq gene expression. I know that I have to 
upload my files -> groom using FASTQ groomer -> download a reference sequence 
from UCSC -> convert the reference genome file to a usable format ->Run Tophat 
for mapping using the groomed file and the converted reference annotation -> 
Filter the single mapped reads -> Run cufflinks using the filtered single 
mapped reads from Tophat.

>From here I will continue with some other statistical analysis but right now I 
>need to get this basic pipeline to work.


Thanks,
Kristen Roop



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


--
Jennifer Jackson
http://galaxyproject.org

___________________________________________________________ The Galaxy User 
list should be used for the discussion of Galaxy analysis and other features on 
the public server at usegalaxy.org. Please keep all replies on the list by 
using "reply all" in your mail client. For discussion of local Galaxy instances 
and the Galaxy source code, please use the Galaxy Development list: 
http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to 
this and other Galaxy lists, please use the interface at: 
http://lists.bx.psu.edu/
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to