Hello Boel, Yes, this is the expected format. We know that only having the transcriptID both both values (transcript and gene) is not great, but it is a function of how the Table browser extracts data (only one table at a time for certain output types, such as GTF).
To retrieve both transcript and gene information, it would be better to use the ensGene genePred format (which actually contains transcripts) and link in the gene name from the table ensGtp. You can do this by selecting the output format "selected fields from primary and related tables". If you really want GTF format, the original file from the source is available here: ftp://ftp.ensembl.org/pub/release-55/gtf/homo_sapiens/Homo_sapiens.GRCh37.55.gtf.gz Please note that the coordinates system used by Ensembl is not the same as that used by UCSC. Thanks, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Bioinformatics Group http://genome.ucsc.edu/ On 2/18/10 6:00 AM, [email protected] wrote: > Dear All, > > I've used the Table browser in order to download all ensembl genes > from the mar 2006 assembly in GTF format. This results in a list with > 1,040,440 entities, which I suppose could be correct. But on each line > transcript ID and gene ID are set to the same value (the transcript ID): > > head ensGene.gtf > chr1 hg18_ensGene CDS 67052401 67052451 0.000000 > - 0 gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene exon 67051162 67052451 0.000000 > - . gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene CDS 67060632 67060788 0.000000 > - 1 gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene exon 67060632 67060788 0.000000 > - . gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene CDS 67065091 67065317 0.000000 > - 0 gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene exon 67065091 67065317 0.000000 > - . gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene CDS 67066083 67066181 0.000000 > - 0 gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene exon 67066083 67066181 0.000000 > - . gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene CDS 67071856 67071977 0.000000 > - 2 gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > chr1 hg18_ensGene exon 67071856 67071977 0.000000 > - . gene_id > "ENST00000371026"; transcript_id "ENST00000371026"; > > Is this a bug, or am I making some mistake here? If I am, how can I > retrieve the correct file? > > Thank you, > Boel > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
