Good Morning Daniel:

If you can use the kent command line utilities available at:

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

The command: genePredToGtf can give you better GTF files than
from the table browser.  UCSC does not keep gene structures
in GTF format. we use a single line format for a single gene
with all the information about that gene in the single line,
GenePred format:
http://genome.ucsc.edu/FAQ/FAQformat.html#format9

To use the kent commands, add this three line file ".hg.conf" to your home 
directory:

$ cat .hg.conf
db.host=genome-mysql.cse.ucsc.edu
db.user=genomep
db.password=password

And set the permissions:
$ chmod 600 .hg.conf

Now you can use the command to extract GTF files directly from the UCSC 
database.
For example, fetch the UCSC gene track from hg19 into the local file 
knownGene.gtf:

$ genePredToGtf hg19 knownGene knownGene.gtf

Note the usage message from the command:

> enePredToGtf - Convert genePred table or file to gtf.
> usage:
>    genePredToGtf database genePredTable output.gtf
> If database is 'file' then track is interpreted as a file
> rather than a table in database.
> options:
>    -utr - Add 5UTR and 3UTR features
>    -honorCdsStat - use cdsStartStat/cdsEndStat when defining start/end
>     codon records
>    -source=src set source name to uses
>    -addComments - Add comments before each set of transcript records.
>     allows for easier visual inspection
> Note: use refFlat or extended genePred table to include geneName

You can also fetch the database text dump of the genePred content for
the track to have the file on-hand locally:

ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz
The SQL structure:
ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.sql

--Hiram

Daniel Standage wrote:
> Hi all.
> 
> I just downloaded gene structure annotations in GTF for hg19. I've noticed
> that the gene_id and transcript_id for each feature are the same. How does
> one know whether two transcript belong to the same gene (alternatively
> spliced isoforms)?
> 
> Thanks!
> 
> PS I ask because I am trying to convert the data into GFF3 format. I assume
> that you have made a conscious choice not to share data in this format and
> that I have not missed it. I was able to find  a few scripts (via Google) to
> do the GTF -> GFF3 conversion, but they all include a caveat about GTF from
> UCSC. So now I'm creating my own...
> 
> --
> Daniel S. Standage
> Graduate Research Assistant
> Bioinformatics and Computational Biology Program
> Department of Genetics, Development, and Cell Biology
> Iowa State University
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to