Jeremy,

                Thank you very much for this information.  One quick question.  
I added the gene_id values to the 10th column of my patched GTF file.  After 
uploading it to Galaxy, the column doesn't have a name (i.e. column 1 = 
Seqname; column 2 = Source; etc...).  Do I need to assign it a name (i.e. 
gene_name or gene_id) for it to be recognized and if so, how do you assign 
column names to GTF files?

Thanks,
David


From: Jeremy Goecks [mailto:jeremy.goe...@emory.edu]
Sent: Thursday, April 07, 2011 9:40 PM
To: David K Crossman
Cc: galaxy-user
Subject: Re: [galaxy-user] RNA seq analysis and GTF files

David,

Your analysis looks reasonable. In fact, in your isoform tracking FPKM file you 
get nearest_ref_id, so that's promising. What I think is needed is the addition 
of an attribute called gene_name to your reference file; you can use whatever 
value you want for gene name, and using the same value as gene_id probably 
makes sense.

Rerun your analysis with the further-patched GTF file, and let us know if this 
doesn't solve the problem. Also note that even using this attribute, some gene 
name/ids and some nearest_ref_id columns will not be populated in some cuffdiff 
files. See the post from Howie in this thread for an explanation from a 
Cufflinks developer: http://seqanswers.com/forums/showthread.php?t=6288

Best,
J.

On Apr 7, 2011, at 5:00 PM, David K Crossman wrote:


Jeremy,

                I've shared it with you using your email address.

Thanks,
David


From: Jeremy Goecks [mailto:jeremy.goe...@emory.edu]
Sent: Thursday, April 07, 2011 3:42 PM
To: David K Crossman
Cc: galaxy-user
Subject: Re: [galaxy-user] RNA seq analysis and GTF files

David, can you please share your history with me and I'll take a look (History 
Options --> Share/Publish --> Share with User --> my email?

Thanks,
J.

On Apr 7, 2011, at 3:23 PM, David K Crossman wrote:



Hello!

                I would like to ask a question related to this thread below.  I 
ran into the same issues as below and was unaware of having to swap some 
columns around in the GTF file.  So, after 'swapping the gene name from the 
complete table (name2 value, column 12) into the GFT file's gene_id value 
(which by default is the same as transcript_id)," I uploaded this "patched" 
file (mm9) into Galaxy and ran Cufflinks, CuffCompare and CuffDiff using this 
"patched" GTF file as the reference annotation.  For both Cufflinks and 
CuffCompare, the gene_id was present in their respective columns.  The problem 
I have encountered now is that in all of the output files in CuffDiff, the 
gene_id column is blank (contains a "-"; highlighted in yellow below).  This 
example is from the CuffDiff gene expression output file:

test_id

gene

locus

sample_1

sample_2

status

value_1

value_2

ln(fold_change)

test_stat

p_value

significant

XLOC_000001

-

chr1:4797973-4836816

q1

q2

OK

73.1908

82.1567

0.115559

-0.71896

0.472168

no

XLOC_000002

-

chr1:4847774-4887990

q1

q2

OK

81.7264

53.1165

-0.43089

2.44474

0.014496

no

XLOC_000003

-

chr1:5073253-5152630

q1

q2

OK

408.289

333.749

-0.20159

2.73173

0.0063

no

XLOC_000004

-

chr1:5578573-5596214

q1

q2

NOTEST

2.34764

4.79772

0.71473

-0.89735

0.369532

no


                What am I doing wrong?  I am interested in the differentially 
expressed genes in this RNA-Seq dataset (as well as calling variants, which is 
my next step, but want to get this answered first before moving on).  Any info, 
suggestions or help would be greatly appreciated.

Thanks,
David


-----Original Message-----
From: 
galaxy-user-boun...@lists.bx.psu.edu<mailto:galaxy-user-boun...@lists.bx.psu.edu>
 [mailto:galaxy-user-boun...@lists.bx.psu.edu] On Behalf Of Jeremy Goecks
Sent: Friday, April 01, 2011 8:47 AM
To: <ssa...@ccib.mgh.harvard.edu<mailto:ssa...@ccib.mgh.harvard.edu>>
Cc: galaxy-user
Subject: Re: [galaxy-user] RNA seq analysis and GTF files



On Mar 31, 2011, at 12:30 PM, 
<ssa...@ccib.mgh.harvard.edu<mailto:ssa...@ccib.mgh.harvard.edu>> 
<ssa...@ccib.mgh.harvard.edu<mailto:ssa...@ccib.mgh.harvard.edu>> wrote:

> Hi Jeremy,
> I used your exercise to perform an RNA-seq analysis. First I encountered a 
> problem where the gene IDs were missing from the results. Jen from the Galaxy 
> team suggested this:
>
> "Yes, the team has taken a look and there are a few things going on.
>
> The first is that when running the Cuffcompare program, a reference 
> annotation file in GTF format should be used in order to obtain the same 
> results as in Jeremy's exercise. This seemed to be missing from your runs, 
> which resulted in badly formatted output that later resulted in a poor result 
> when Cuffdiff was used.
>
> The second has to do with the reference GTF file itself. For the best 
> results, the GTF file must have the "gene_id" attribute defined in the 9th 
> column of the file and the chromosome names must be in the same format as the 
> genome native to Galaxy. Depending on the source of the reference GTF, one of 
> these may need to be adjusted. Chromosome names can be adjusted using 
> Galaxy's "Text Manipulation" tools. The gene_id attribute would need to be 
> adjusted prior to loading into Galaxy.
>
> For mm9, using the "Get Data -> UCSC Main table browser" tool can help you to 
> obtain all of the raw data necessary to create a complete GTF file with a 
> gene_id identifier. Extract data from the track "RefSeq Genes" and output the 
> primary data table "refGene" twice - first in GTF format, then again as the 
> complete table in tabular format (not BED). Then, using your own tools, swap 
> in the gene name from the complete table (name2 value, column 12) into the 
> GTF file's gene_id value (which by default is the same as transcript_id). 
> Upload and the tools will function as intended.
>
> The team is aware of the issues associated with GTF source files and is 
> discussing solutions. Any changes to native data content will be reported to 
> the mailing list in a News Brief or other communications.
>
> Our apologies for the inconvenience! Thanks for using Galaxy and
> please let us know if we can help again,
>
> Best,
>
> Jen
> Galaxy team"
>
>
> I followed the directions (or at least I think I did) and things seemed to 
> work better but there is one more issue for example in file:
> Galaxy287-[Cuffdiff_on_data_197,_data_197,_and_data_274__isoform_FPKM_
> tracking].tabular.txt The column gene_short_name does not have any
> names in it. nearest_ref_id does have the gene ID info so I can still 
> interpret the data, but I was wondering if there remains another problem that 
> I'm not aware of with the GTF file.

Slim,

Please send questions to the galaxy-user mailing list (cc'd) rather than 
individual Galaxy team members; there are many people on the list that may be 
able to address your question, and discussions are archived for future use as 
well. Without seeing your analysis, I'd suggest trying two things:

(1) Provide gene annotation reference file to Cufflinks as well as Cuffcompare 
and Cuffdiff; in other words, you'll want to do guided assembly.
(2) Try using an Ensembl GTF, which has the gene name in the attributes.

I think (2) is more likely to generate the results you want, but there are the 
many known problems in using Ensembl GTFs with Cufflinks/compare/diff.

Good luck,
J.
___________________________________________________________
The Galaxy User list should be used for the discussion of Galaxy analysis and 
other features on the public server at usegalaxy.org<http://usegalaxy.org/>.  
Please keep all replies on the list by using "reply all" in your mail client.  
For discussion of local Galaxy instances and the Galaxy source code, please use 
the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to