Hi Maayan, One of our engineers has offered this further explanation:
The UCSC RefGene track contains BLAT alignments of the RefSeq mRNA and RNA entries. These RefSeq entries are transcript sequences, not genomic annotations, and are independent of any given assembly. The UCSC RefGene alignments are analogous, but not the same as the genomic mappings of these transcripts produced by NCBI. NCBI uses a different alignment process than UCSC, and the processes don't always agree. -- Brooke Rhead UCSC Genome Bioinformatics Group On 11/24/10 19:34, maayan kreitzman wrote: > Hi All, > > The explanation supplied is not adequate. > The RefSeq project supplies information on transcripts that are unique - and > on the NCBI (which created refseq), indeed, there is only ONE record per > acession. (Try a simple search in Entrez). Indeed, the refseq project often > supplies muliple accession for the same or similar loci with various > splices. That's the whole point. It's a conservative approach - one name, > one transcript. > There is a mistake in the adaptation of their database to yours. Your > explanation makes no sense unless you went and did all the alignments and > selection from scratch - and if that's the case, why would you call it a > RefSeq track? > > maayan > > > On Thu, Nov 25, 2010 at 12:08 AM, Pauline Fujita <[email protected]>wrote: > >> Hello Maayan, >> >> Please see this previously answered mailing list question about the same >> issue: >> >> https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024242.html >> >> Hopefully this information was helpful and answers your question. If you >> have further questions or require clarification feel free to contact the >> mailing list at [email protected]. >> >> Regards, >> >> Pauline Fujita >> UCSC Genome Bioinformatics Group >> http://genome.ucsc.edu >> >> >> >> On 11/24/10 01:08, maayan kreitzman wrote: >> >>> Hi there, >>> I've found a kind of serious problem with your database which is based on >>> the RefSeq project. >>> Many of the refseq accessions, when queried from the genome browser return >>> more than one gene, IN COMPLETELY DIFFERENT LOCATIONS. >>> If you search, say, NM_198181, this is the case. Sometimes, like in the >>> case >>> of NM_020364, the different entries are even on opposite strands. >>> if you want a longer list of examples like this, I can send you some more. >>> The mistake is somewhere in the conversion from the RefSeq database to >>> your >>> software, because if you search the same accessions in Entrez you get, as >>> expected, ONE gene. >>> Reqseq documents specific, unique, verified transcripts. There should not >>> be >>> more than one set of coordinates for each refseq accession. >>> maayan >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> >> > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
