Hi all, I am trying to partition the human genome into a set of disjoint regions, such that any particular SNP marker belongs to a single region. As a first cut I would like these regions to correspond to known protein-coding genes and their upstream regions.
I wrote some Python&SQL code to query my personal mirror of the hg19 database, but have had trouble using the refGene table. It seems that refGene is a table of transcripts with information on where they align in the genome. In particular this means that some of the rows in refGene align to multiple places in the genome. Is there a way to extract a canonical set of known, coding, uniquely-aligned genes from refGene or some other table in your database? Thank you! David Alexander UCLA Department of Biomathematics http://dalexander.bol.ucla.edu/ _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
