Re: [Genome] About NCBI gene coordinate from Gene ID or refseqID

John, Shibu Tue, 08 Feb 2011 16:51:42 -0800

Hi Greg,

Thanks for the detailed information.
 Finally I have 8726  "NM" id which does not match in UCSC...
******example**
NM_001002792
NM_001003669
NM_001003914
NM_001003930
NM_001004138
NM_001013392
NM_001013793
NM_001013800
NM_001013808
NM_001014425
NM_001024731
NM_001024838
NM_001024839
NM_001024840
NM_001024849
NM_001025208
NM_001025241
NM_001030310
NM_001033001
NM_001033127
NM_001033151
NM_001033160
NM_001033198
*******
Thanks,
Shibu

________________________________________
From: Greg Roe [[email protected]]
Sent: 08 February 2011 19:56
To: John, Shibu
Cc: [email protected]
Subject: Re: [Genome] About NCBI gene coordinate from Gene ID or refseqID

Hi Shibu,

The easiest way to do this would be to use the Table Browser
(http://genome.ucsc.edu/cgi-bin/hgTables).  Select the assembly of
interest, mm9 I assume. The select:

Group: Genes and Gene Prediction Tracks
Track: RefSeq Genes
Table: refGenes
Region: genome

Then under identifiers click upload list.  You'll need to make a list of
all the gene ids, without all the extra data (ex: NM_020501).  You'll
need to remove the version numbering as well. So NM_020501.1 should be
shown as NM_020501, without the .1. The east low-tech way to do this
would be to load your data in a spreadsheet using the pipe as a column
delimiter, removing the extra columns, then doing a find and replace to
remove all the version designations, ".1", etc. Then save the gene ids
to a text file and upload that.

Then set the output format to "selected fields from primary and related
tables, choose desired file type returned, and click "get output".

On the subsequent screen, check the boxes next to "name", "txStart", and
"txStop".  Click "get output" and you should have the data you need.

Now, there are about 28,150 rows in that data set.  You may have more
refSeq ids because UCSC only displays Accession types that start with
NM_ and NR_.  There are several other types, see:
http://www.ncbi.nlm.nih.gov/projects/RefSeq/key.html#accessions. So we
only display a subset of the total.

Hope that helps!

Just email the genome list if you have any additional questions.

-
Greg Roe
UCSC Genome Browser Group

On 2/6/11 1:31 PM, John, Shibu wrote:
> Hi,
>
> I have a list of  (36692) NCBI refSeq id in the following format. (Mouse,  
> downloaded on May 2009 )
> *****
> gi|10048421|ref|NM_020488.1|
> gi|10048425|ref|NM_020501.1|
> ****
> Is there any way to get the chromosome start, end position  of these geneID? 
> ( chr6   131636578       131637481  gi|10048425|ref|NM_020501.1|)
>
> I tried to intersect with the "NM_" id with UCSC 
> "http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/refGene.txt.gz";
> ********
> gi|10048425|ref|NM_020501.1|     Tas2r105        NM_020501       chr6    -    
>    131636578       131637481
> ********
>
> But this "refGene.txt" contains only 28108  id's..
>
> And I tried to find the gene with Entrez batch finder ..
> *******
> Id=XM_915912:   This record was removed as a result of standard genome 
> annotation processing. See the genome build documentation at 
> http://www.ncbi.nlm.nih.gov/genome/guide/build.html for further information, 
> or contact [email protected].
> Id=XM_912174:   This record was replaced or removed.
> ........
> .......
> Received lines: 36692
> Rejected lines: 17
> Removed duplicates: 0
> Passed to Entrez: 36675
> *********
>
> It will be a great help that if you can help me to get the chromosome start 
> position of these genes  or corresponding ENSEMBL gene ID.
>
> Thanks,
> Shibu
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] About NCBI gene coordinate from Gene ID or refseqID

Reply via email to