Hi Maayan, I will pass your suggestion along to our developers.
-- Brooke Rhead UCSC Genome Bioinformatics Group On 11/29/10 13:37, maayan kreitzman wrote: > Thanks for the clarification. > > If that's the case though, perhaps it would be good to reconsider the > rationale of the track. Considering that RefSeq is chiefly an annotation and > curation project, it doesn't make sense to me that you would take only the > transcript sequences from refseq, then re-align and re-annotate them - and > then call the resuling track "RefSeq". (Since it doesn't actually agree with > RefSeq itself.) Transcript sequences are a dime a dozen; it's the curation > and annotation processes that distinguish one database's set of genes from > another. So when a user sees a track called "refseq" (or whatever else), > they would expect the infomation there to represent that database, not a > reworking of the databases's raw data. > That's my suggestion, anyway. > > > On Mon, Nov 29, 2010 at 9:58 PM, Brooke Rhead <[email protected]> wrote: > >> Hi Maayan, >> >> One of our engineers has offered this further explanation: >> >> The UCSC RefGene track contains BLAT alignments of the RefSeq mRNA and RNA >> entries. These RefSeq entries are transcript sequences, not genomic >> annotations, and are independent of any given assembly. The UCSC RefGene >> alignments are analogous, but not the same as the genomic mappings of these >> transcripts produced by NCBI. NCBI uses a different alignment process than >> UCSC, and the processes don't always agree. >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> >> On 11/24/10 19:34, maayan kreitzman wrote: >> >>> Hi All, >>> >>> The explanation supplied is not adequate. >>> The RefSeq project supplies information on transcripts that are unique - >>> and >>> on the NCBI (which created refseq), indeed, there is only ONE record per >>> acession. (Try a simple search in Entrez). Indeed, the refseq project >>> often >>> supplies muliple accession for the same or similar loci with various >>> splices. That's the whole point. It's a conservative approach - one name, >>> one transcript. >>> There is a mistake in the adaptation of their database to yours. Your >>> explanation makes no sense unless you went and did all the alignments and >>> selection from scratch - and if that's the case, why would you call it a >>> RefSeq track? >>> >>> maayan >>> >>> >>> On Thu, Nov 25, 2010 at 12:08 AM, Pauline Fujita <[email protected] >>>> wrote: >>> Hello Maayan, >>>> Please see this previously answered mailing list question about the same >>>> issue: >>>> >>>> https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024242.html >>>> >>>> Hopefully this information was helpful and answers your question. If you >>>> have further questions or require clarification feel free to contact the >>>> mailing list at [email protected]. >>>> >>>> Regards, >>>> >>>> Pauline Fujita >>>> UCSC Genome Bioinformatics Group >>>> http://genome.ucsc.edu >>>> >>>> >>>> >>>> On 11/24/10 01:08, maayan kreitzman wrote: >>>> >>>> Hi there, >>>>> I've found a kind of serious problem with your database which is based >>>>> on >>>>> the RefSeq project. >>>>> Many of the refseq accessions, when queried from the genome browser >>>>> return >>>>> more than one gene, IN COMPLETELY DIFFERENT LOCATIONS. >>>>> If you search, say, NM_198181, this is the case. Sometimes, like in the >>>>> case >>>>> of NM_020364, the different entries are even on opposite strands. >>>>> if you want a longer list of examples like this, I can send you some >>>>> more. >>>>> The mistake is somewhere in the conversion from the RefSeq database to >>>>> your >>>>> software, because if you search the same accessions in Entrez you get, >>>>> as >>>>> expected, ONE gene. >>>>> Reqseq documents specific, unique, verified transcripts. There should >>>>> not >>>>> be >>>>> more than one set of coordinates for each refseq accession. >>>>> maayan >>>>> _______________________________________________ >>>>> Genome maillist - [email protected] >>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>>>> >>>>> >>>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
