Thanks for your reply!
My raw RNA-seq data was mapped to the hg19 without reference GTF in our
local instance. In order to troubleshoot, I tried the following:
(1) use Tophat to map data again with hg19, and iGenome ensembl.GTF, then
use Cuffdiff to find differential expressed genes. There are still 250
(2) use Tophat to map data again with hg19 without reference GTF, use
cufflink with Homo_sapiens.GRCh37.69.gtf downloaded from ensembl.org. Same
results with 250 significant genes.
(3) use Tophat to map data again with hg19 without reference GTF, use
cufflink with refseq refFlat.GTF, The results are ~1000 significant genes.
(4) use Tophat to map data again with hg19 without reference GTF, use
cufflink with refseq iGenome refseq.GTF, The results are ~1000 significant
However, I need to confirm what release or version is the hg19 reference
genome I am using. Do you think the different results are caused by
mapping to different hg19 genome? if so, how can you find a match of hg19
with reference to a correct GTF? I thought the use of ensembl or refseq
would not affect the results in cuffdiff step. These reference GTF file
(refFlat.GTF, iGenome refseq.GTF, or iGenome ensembl.GTF) should represents
On Mon, Jan 7, 2013 at 5:27 PM, Jennifer Hillman-Jackson <j...@bx.psu.edu>wrote:
> Hello Wei,
> The contents of the reference GTF files (original, before analysis) will
> probably provide some explanation. My guess is that GTF files have
> different contents and are not directly comparable - RefSeq with full
> transcripts and Ensembl with full transcripts + potentially partial
> predictions and/or predicted splice sites. Alternative versions of each may
> be available. When possible, you most likely will want to be using a
> reference GTF file that represents complete transcripts.
> I don't know what genome you are using, but you can check the source notes
> at Ensembl (& NCBI) to find out what each annotation build contains. A raw
> count on the number of entries in the GTF files can also be a clue - if
> greatly different, then you very likely have different populations in the
> two files.
> Good luck with your project!
> Galaxy team
> On 1/7/13 1:47 PM, Wei Liao wrote:
>> Hi all,
>> I am analyzing significant differential expressed genes for a pair of
>> normal V.S tumor, using Cuffdiff 2.0.2.
>> I noticed that by using ensemble GTF and refseq GTF, the results showed
>> a big difference on the number of genes being significant expressed.
>> For ensemble GTF, there are only 250 genes differential expressed.
>> But for refseq GTF, there are about 1000 genes.
>> I am running these data on Galaxy server and with the same workflow.
>> Can anyone explain what is going on here? so which result should I trust?
>> Wei Liao
>> Research Scientist,
>> Brentwood Biomedical Research Institute
>> 16111 Plummer St.
>> Bldg 7, Rm D-122
>> North Hills, CA 91343
>> 818-891-7711 ext 7645
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org. Please keep all replies on the list by
>> using "reply all" in your mail client. For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
> Jennifer Hillman-Jackson
> Galaxy Support and Training
Brentwood Biomedical Research Institute
16111 Plummer St.
Bldg 7, Rm D-122
North Hills, CA 91343
818-891-7711 ext 7645
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: