Hello Phoenix

It appears that the data comparison in your example is between two 
different genomic assemblies. Specifically, the UCSC Browser data is 
from NCBI Build 36.1 and the Entrez data is from NCBI Build 39.1.

The UCSC Genome Browser was very recently updated to include the 
February 2009 human reference sequence (GRCh37, NCBI as Build 37.1). 
This release is labeled "February 2009 - hg19" in the UCSC Genome 
Browser Gateway/Table Browser/Download/mySQL areas. However, please be 
aware that the UCSC Browser's default human assembly is still Mar. 2006 
(hg18). We are actively constructing some major tracks for the hg19 
assembly and will change the default sometime soon.

NCBI also recently updated to the same new human assembly (GRCh37) and 
labeled it Build 37.1. This is their default human assembly.

Please double check that all data is from the same version of genomic 
and get back to us if you still notice major coordinate differences,

Thank you for the example, very helpful,
Jennifer Jackson
UCSC Genome Bioinformatics Group

Kwan, Phoenix wrote:
> Hi Jennifer,
>
> What I have found is that the locations of almost all human's NM transcripts 
> in RefGene file are off from NCBI's by usually at least thousands of bp.  For 
> example, NM_152486's transcript starts at 850984 and end at 869824 which I 
> looked it up on NCBI's Entrez Gene.  But in RefGene file that I downloaded 
> yesterday, it starts from 861120 and ends at 879961.  However, I looked this 
> gene up from a RefGene file that I downloaded 2 weeks ago, the data match 
> with what are in NCBI currently.
>
>
> Thank you,
> Phoenix
>
> -----Original Message-----
> From: Jennifer Jackson [mailto:[email protected]]
> Sent: Tuesday, May 05, 2009 6:41 PM
> To: Kwan, Phoenix
> Cc: [email protected]
> Subject: Re: [Genome] genomic locations in RefGene file
>
> Hello,
>
> RefSeq sequences are independently aligned using BLAT and the
> coordinates are based on complete chromosomes. The genomic
> version/source is noted on the gateway page for each assembly. Some
> differences in alignment position are known and expected for this track,
> but most should be the same or very similar for the same version of an
> assembly and query RefSeq.
>
> The UCSC Browser uses a different method of storing coordinates than
> NCBI. This may be the source of the discrepancy. Please read the
> documentation below and if you still have some questions, send a few
> examples (database, refseqID, NCBI coordinates as you interpret them,
> UCSC coordinates as you interpret them) for review and feedback.
>
> The main table for the RefSeq Genes track is called refGene and is in
> genePred format
> http://genome.ucsc.edu/FAQ/FAQformat#format9
>
> Description of UCSC Browser coordinate system
> http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
>
> Jennifer Jackson
> UCSC Genome Bioinformatics Group
>
>
>
> Kwan, Phoenix wrote:
>   
>> Hi,
>>
>> I have found that most of the locations for the NM transcripts in the 
>> RefGene file do not match with what are in NCBI.  Are the positions in the 
>> RefGene file genomic locations relative to the full length of a chromosome?  
>> Or something else?
>>
>> Thank you very much for your time,
>> Phoenix Kwan
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>>     
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to