Is there a better option, then? Something curated? Michael
-----Original Message----- From: Brooke Rhead [mailto:[email protected]] Sent: Friday, October 07, 2011 5:52 PM To: Rusch, Michael Cc: '[email protected]' Subject: Re: [Genome] genes with disparate loci in refFlat Hi Michael, The RefSeq Genes track is made by aligning RefSeq sequences to the genome using BLAT. You can click on the blue "RefSeq Genes" link on the main Genome Browser page to read the track description. In part, it says: "RefSeq RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept." So, it is expected that some sequences will align very well in multiple locations. One explanation for what you are seeing is duplication events in the genome. You might try turning on the "Segmental Dups" track (in the Variation and Repeats track group). Both of your example regions show activity in that track. If you have further questions, please contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 10/6/11 7:27 AM, Rusch, Michael wrote: > I've found some things in refFlat that I don't understand. Perhaps somebody can help shed some light on this. > > Intuitively it seemed to me that in most circumstances, all of the records with the same geneName should be in about the same place, and certainly in the same orientation on the same chromosome. However, I have found several situations where this is not the case. Some of these make sense to me, for example, genes in the PARs have records on both chrX and chrY. Also, there are several that have some records on the "hap" sequences. These I can understand. Others truly puzzle me. Maybe somebody can help me interpret. > > First example is MAGEA2. This gene has two locations on chrX: > MAGEA2 chrX - 151918388 151922364 3 > MAGEA2 chrX + 151883119 151887095 3 > > I don't understand how the same gene could be in two different places? > > In some cases they are even on different chromosomes. > > In many cases, there seem to be duplicates with different geneName/names. For example: > > MIR4509-1 NR_039732 chr15 - 22675147 22675241 > MIR4509-2 NR_039733 chr15 - 22675147 22675241 > MIR4509-3 NR_039734 chr15 - 22675147 22675241 > MIR4509-1 NR_039732 chr15 + 28671636 28671730 > MIR4509-2 NR_039733 chr15 + 28671636 28671730 > MIR4509-3 NR_039734 chr15 + 28671636 28671730 > MIR4509-1 NR_039732 chr15 - 28735897 28735991 > MIR4509-2 NR_039733 chr15 - 28735897 28735991 > MIR4509-3 NR_039734 chr15 - 28735897 28735991 > > In this case, there are three geneName/name combinations, and three loci, and each geneName/name has a record in each locus. > > There are hundreds of these that I've found. > > I get the impression that I'm not using this data correctly, and perhaps there would be a better table to be using for the purpose of locating genes and annotated transcripts on the genome. Can anybody explain this to me? > Michael > > ________________________________ > Email Disclaimer: www.stjude.org/emaildisclaimer > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
