Good Morning Daniel: If you can use the kent command line utilities available at:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ The command: genePredToGtf can give you better GTF files than from the table browser. UCSC does not keep gene structures in GTF format. we use a single line format for a single gene with all the information about that gene in the single line, GenePred format: http://genome.ucsc.edu/FAQ/FAQformat.html#format9 To use the kent commands, add this three line file ".hg.conf" to your home directory: $ cat .hg.conf db.host=genome-mysql.cse.ucsc.edu db.user=genomep db.password=password And set the permissions: $ chmod 600 .hg.conf Now you can use the command to extract GTF files directly from the UCSC database. For example, fetch the UCSC gene track from hg19 into the local file knownGene.gtf: $ genePredToGtf hg19 knownGene knownGene.gtf Note the usage message from the command: > enePredToGtf - Convert genePred table or file to gtf. > usage: > genePredToGtf database genePredTable output.gtf > If database is 'file' then track is interpreted as a file > rather than a table in database. > options: > -utr - Add 5UTR and 3UTR features > -honorCdsStat - use cdsStartStat/cdsEndStat when defining start/end > codon records > -source=src set source name to uses > -addComments - Add comments before each set of transcript records. > allows for easier visual inspection > Note: use refFlat or extended genePred table to include geneName You can also fetch the database text dump of the genePred content for the track to have the file on-hand locally: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz The SQL structure: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.sql --Hiram Daniel Standage wrote: > Hi all. > > I just downloaded gene structure annotations in GTF for hg19. I've noticed > that the gene_id and transcript_id for each feature are the same. How does > one know whether two transcript belong to the same gene (alternatively > spliced isoforms)? > > Thanks! > > PS I ask because I am trying to convert the data into GFF3 format. I assume > that you have made a conscious choice not to share data in this format and > that I have not missed it. I was able to find a few scripts (via Google) to > do the GTF -> GFF3 conversion, but they all include a caveat about GTF from > UCSC. So now I'm creating my own... > > -- > Daniel S. Standage > Graduate Research Assistant > Bioinformatics and Computational Biology Program > Department of Genetics, Development, and Cell Biology > Iowa State University _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
