Re: [Genome] tables adapted from refseq

Brooke Rhead Mon, 29 Nov 2010 18:23:55 -0800

Hi Maayan,

I will pass your suggestion along to our developers.


--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 11/29/10 13:37, maayan kreitzman wrote:
> Thanks for the clarification.
> 
> If that's the case though, perhaps it would be good to reconsider the
> rationale of the track. Considering that RefSeq is chiefly an annotation and
> curation project, it doesn't make sense to me that you would take only the
> transcript sequences from refseq, then re-align and re-annotate them - and
> then call the resuling track "RefSeq". (Since it doesn't actually agree with
> RefSeq itself.) Transcript sequences are a dime a dozen; it's the curation
> and annotation processes that distinguish one database's set of genes from
> another. So when a user sees a track called "refseq" (or whatever else),
> they would expect the infomation there to represent that database, not a
> reworking of the databases's raw data.
> That's my suggestion, anyway.
> 
> 
> On Mon, Nov 29, 2010 at 9:58 PM, Brooke Rhead <[email protected]> wrote:
> 
>> Hi Maayan,
>>
>> One of our engineers has offered this further explanation:
>>
>> The UCSC RefGene track contains BLAT alignments of the RefSeq mRNA and RNA
>> entries. These RefSeq entries are transcript sequences, not genomic
>> annotations, and are independent of any given assembly. The UCSC RefGene
>> alignments are analogous, but not the same as the genomic mappings of these
>> transcripts produced by NCBI. NCBI uses a different alignment process than
>> UCSC, and the processes don't always agree.
>>
>> --
>> Brooke Rhead
>> UCSC Genome Bioinformatics Group
>>
>>
>>
>> On 11/24/10 19:34, maayan kreitzman wrote:
>>
>>>  Hi All,
>>>
>>> The explanation supplied is not adequate.
>>> The RefSeq project supplies information on transcripts that are unique -
>>> and
>>> on the NCBI (which created refseq), indeed, there is only ONE record per
>>> acession. (Try a simple search in Entrez). Indeed, the refseq project
>>> often
>>> supplies muliple accession for the same or similar loci with various
>>> splices. That's the whole point. It's a conservative approach - one name,
>>> one transcript.
>>> There is a mistake in the adaptation of their database to yours. Your
>>> explanation makes no sense unless you went and did all the alignments and
>>> selection from scratch - and if that's the case, why would you call it a
>>> RefSeq track?
>>>
>>> maayan
>>>
>>>
>>> On Thu, Nov 25, 2010 at 12:08 AM, Pauline Fujita <[email protected]
>>>> wrote:
>>>  Hello Maayan,
>>>> Please see this previously answered mailing list question about the same
>>>> issue:
>>>>
>>>> https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024242.html
>>>>
>>>> Hopefully this information was helpful and answers your question. If you
>>>> have further questions or require clarification feel free to contact the
>>>> mailing list at [email protected].
>>>>
>>>> Regards,
>>>>
>>>> Pauline Fujita
>>>> UCSC Genome Bioinformatics Group
>>>> http://genome.ucsc.edu
>>>>
>>>>
>>>>
>>>> On 11/24/10 01:08, maayan kreitzman wrote:
>>>>
>>>>  Hi there,
>>>>> I've found a kind of serious problem with your database which is based
>>>>> on
>>>>> the RefSeq project.
>>>>> Many of the refseq accessions, when queried from the genome browser
>>>>> return
>>>>> more than one gene, IN COMPLETELY DIFFERENT LOCATIONS.
>>>>> If you search, say, NM_198181, this is the case. Sometimes, like in the
>>>>> case
>>>>> of NM_020364, the different entries are even on opposite strands.
>>>>> if you want a longer list of examples like this, I can send you some
>>>>> more.
>>>>> The mistake is somewhere in the conversion from the RefSeq database to
>>>>> your
>>>>> software, because if you search the same accessions in Entrez you get,
>>>>> as
>>>>> expected, ONE gene.
>>>>> Reqseq documents specific, unique, verified transcripts. There should
>>>>> not
>>>>> be
>>>>> more than one set of coordinates for each refseq accession.
>>>>> maayan
>>>>> _______________________________________________
>>>>> Genome maillist  -  [email protected]
>>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>>
>>>>>
>>>>  _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] tables adapted from refseq

Reply via email to