Hello Boel,

Yes, this is the expected format. We know that only having the 
transcriptID both both values (transcript and gene) is not great, but it 
is a function of how the Table browser extracts data (only one table at 
a time for certain output types, such as GTF).

To retrieve both transcript and gene information, it would be better to 
use the ensGene genePred format (which actually contains transcripts) 
and link in the gene name from the table ensGtp. You can do this by 
selecting the output format "selected fields from primary and related 
tables".

If you really want GTF format, the original file from the source is 
available here:
ftp://ftp.ensembl.org/pub/release-55/gtf/homo_sapiens/Homo_sapiens.GRCh37.55.gtf.gz
 


Please note that the coordinates system used by Ensembl is not the same 
as that used by UCSC.

Thanks,
Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu/

On 2/18/10 6:00 AM, [email protected] wrote:
> Dear All,
>
> I've used the Table browser in order to download all ensembl genes
> from the mar 2006 assembly in GTF format. This results in a list with
> 1,040,440 entities, which I suppose could be correct. But on each line
> transcript ID and gene ID are set to the same value (the transcript ID):
>
> head ensGene.gtf
> chr1  hg18_ensGene    CDS     67052401        67052451        0.000000        
> -       0       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    exon    67051162        67052451        0.000000        
> -       .       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    CDS     67060632        67060788        0.000000        
> -       1       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    exon    67060632        67060788        0.000000        
> -       .       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    CDS     67065091        67065317        0.000000        
> -       0       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    exon    67065091        67065317        0.000000        
> -       .       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    CDS     67066083        67066181        0.000000        
> -       0       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    exon    67066083        67066181        0.000000        
> -       .       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    CDS     67071856        67071977        0.000000        
> -       2       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
> chr1  hg18_ensGene    exon    67071856        67071977        0.000000        
> -       .       gene_id
> "ENST00000371026"; transcript_id "ENST00000371026";
>
> Is this a bug, or am I making some mistake here? If I am, how can I
> retrieve the correct file?
>
> Thank you,
> Boel
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to