Re: [Genome] tables adapted from refseq

Brooke Rhead Tue, 30 Nov 2010 21:45:54 -0800

Hello again Maayan,

Some more information on this track from our developers:


The RefSeq mRNA/RNA curation and annotation process is done on the
transcripts, not the genome. These entries are *not* raw data and are
released independently of the genomic annotations created using this 
data.  If fact, not all of the human RefSeq mRNAs can be mapped to the 
GRCh37 genome assembly. To have a better understanding of the RefSeq 
annotations processes, please carefully read:

http://www.ncbi.nlm.nih.gov/books/NBK21091/

The RefSeq alignments at UCSC have been used for nearly 10 years without 
this particular complaint.  As a matter of fact, the RefSeq staff 
actually use these alignments in their work.

So, it is unlikely that we will be making any changes to the RefSeq 
Genes track.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


Brooke Rhead wrote on 11/29/10 6:22 PM:
> Hi Maayan,
>
> I will pass your suggestion along to our developers.
>
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
>
>
> On 11/29/10 13:37, maayan kreitzman wrote:
>> Thanks for the clarification.
>>
>> If that's the case though, perhaps it would be good to reconsider the
>> rationale of the track. Considering that RefSeq is chiefly an annotation and
>> curation project, it doesn't make sense to me that you would take only the
>> transcript sequences from refseq, then re-align and re-annotate them - and
>> then call the resuling track "RefSeq". (Since it doesn't actually agree with
>> RefSeq itself.) Transcript sequences are a dime a dozen; it's the curation
>> and annotation processes that distinguish one database's set of genes from
>> another. So when a user sees a track called "refseq" (or whatever else),
>> they would expect the infomation there to represent that database, not a
>> reworking of the databases's raw data.
>> That's my suggestion, anyway.
>>
>>
>> On Mon, Nov 29, 2010 at 9:58 PM, Brooke Rhead<[email protected]>  wrote:
>>
>>> Hi Maayan,
>>>
>>> One of our engineers has offered this further explanation:
>>>
>>> The UCSC RefGene track contains BLAT alignments of the RefSeq mRNA and RNA
>>> entries. These RefSeq entries are transcript sequences, not genomic
>>> annotations, and are independent of any given assembly. The UCSC RefGene
>>> alignments are analogous, but not the same as the genomic mappings of these
>>> transcripts produced by NCBI. NCBI uses a different alignment process than
>>> UCSC, and the processes don't always agree.
>>>
>>> --
>>> Brooke Rhead
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>>
>>> On 11/24/10 19:34, maayan kreitzman wrote:
>>>
>>>>   Hi All,
>>>>
>>>> The explanation supplied is not adequate.
>>>> The RefSeq project supplies information on transcripts that are unique -
>>>> and
>>>> on the NCBI (which created refseq), indeed, there is only ONE record per
>>>> acession. (Try a simple search in Entrez). Indeed, the refseq project
>>>> often
>>>> supplies muliple accession for the same or similar loci with various
>>>> splices. That's the whole point. It's a conservative approach - one name,
>>>> one transcript.
>>>> There is a mistake in the adaptation of their database to yours. Your
>>>> explanation makes no sense unless you went and did all the alignments and
>>>> selection from scratch - and if that's the case, why would you call it a
>>>> RefSeq track?
>>>>
>>>> maayan
>>>>
>>>>
>>>> On Thu, Nov 25, 2010 at 12:08 AM, Pauline Fujita<[email protected]
>>>>> wrote:
>>>>   Hello Maayan,
>>>>> Please see this previously answered mailing list question about the same
>>>>> issue:
>>>>>
>>>>> https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024242.html
>>>>>
>>>>> Hopefully this information was helpful and answers your question. If you
>>>>> have further questions or require clarification feel free to contact the
>>>>> mailing list at [email protected].
>>>>>
>>>>> Regards,
>>>>>
>>>>> Pauline Fujita
>>>>> UCSC Genome Bioinformatics Group
>>>>> http://genome.ucsc.edu
>>>>>
>>>>>
>>>>>
>>>>> On 11/24/10 01:08, maayan kreitzman wrote:
>>>>>
>>>>>   Hi there,
>>>>>> I've found a kind of serious problem with your database which is based
>>>>>> on
>>>>>> the RefSeq project.
>>>>>> Many of the refseq accessions, when queried from the genome browser
>>>>>> return
>>>>>> more than one gene, IN COMPLETELY DIFFERENT LOCATIONS.
>>>>>> If you search, say, NM_198181, this is the case. Sometimes, like in the
>>>>>> case
>>>>>> of NM_020364, the different entries are even on opposite strands.
>>>>>> if you want a longer list of examples like this, I can send you some
>>>>>> more.
>>>>>> The mistake is somewhere in the conversion from the RefSeq database to
>>>>>> your
>>>>>> software, because if you search the same accessions in Entrez you get,
>>>>>> as
>>>>>> expected, ONE gene.
>>>>>> Reqseq documents specific, unique, verified transcripts. There should
>>>>>> not
>>>>>> be
>>>>>> more than one set of coordinates for each refseq accession.
>>>>>> maayan
>>>>>> _______________________________________________
>>>>>> Genome maillist  -  [email protected]
>>>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>>>
>>>>>>
>>>>>   _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] tables adapted from refseq

Reply via email to