Re: [Genome] question regarding gene databases

Jennifer Jackson Fri, 25 Jun 2010 11:24:12 -0700

Hello David,

As an alternative, try using the UCSC Genes track. RefSeq is included 
along with other inputs.

To focus on a single transcript per gene bound, use the primary table 
(knownGene) along with the table that clusters the data and notes the 
canonical transcript (knownCanonical).

You can link back the internal transcript identifiers to other alternate 
identifiers (including RefSeq) using the table kgAlias. The table kgXref 
can also be used, but be aware the linked IDs in this table can be 
simply "associated" identifiers, not this transcript's specific 
alternate identifiers.

To see the linkage between tables for this or any other track, use the 
Table browser (perhaps the one at UCSC, if this was not included in your 
mirror). Bring up the Table browser, navigate to the genome & track of 
interest, and leaving the primary table selected - click on the button 
beside it called "describe table schema".

The resulting page will define that primary table and list all 
associated tables (along with notes about how they are linked). Clicking 
on any of those tables will bring them to the "top" when their schema is 
defined and their related tables are listed. This is how we share the 
overall schema design with users.

I hope this information is helpful.  Please feel free to contact the
help mailing list again if you require further assistance.

Best regards,
Jen

UCSC Genome Browser Support
http://genome.ucsc.edu/contacts.html
[email protected]  [email protected]

On 6/25/10 8:38 AM, David Alexander wrote:
> Hi all,
>
> I am trying to partition the human genome into a set of disjoint
> regions, such that any particular SNP marker belongs to a single
> region.  As a first cut I would like these regions to correspond to
> known protein-coding genes and their upstream regions.
>
> I wrote some Python&SQL code to query my personal mirror of the hg19
> database, but have had trouble using the refGene table.  It seems that
> refGene is a table of transcripts with information on where they align
> in the genome.  In particular this means that some of the rows in
> refGene align to multiple places in the genome.  Is there a way to
> extract a canonical set of known, coding, uniquely-aligned genes from
> refGene or some other table in your database?
>
> Thank you!
> David Alexander
> UCLA Department of Biomathematics
> http://dalexander.bol.ucla.edu/
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] question regarding gene databases

Reply via email to