Hi Shibu, I checked a few of the samples in your list below, and they all seem to be sequences that have been removed from the RefSeq db. For example, see the revision history for this one:
==> http://www.ncbi.nlm.nih.gov/sviewer/girevhist.cgi?val=NM_001003930 Note, it says: "NM_001003930.1: This RefSeq was permanently suppressed because currently there is insufficient support for the transcript and the protein." I think the list you have is outdated, and thus contains many sequences that are now listed as "suppressed" and not included in the latest RefSeq db - you did mention that you downloaded your data in May 2009. The data in our RefSeq track is updated daily from NCBI. For update frequency info on this data at UCSC, please see: http://genome.ucsc.edu/FAQ/FAQreleases.html#release6 Hope that helps. - Greg On 2/8/11 4:15 PM, John, Shibu wrote: > Hi Greg, > > Thanks for the detailed information. > Finally I have 8726 "NM" id which does not match in UCSC... > ******example** > NM_001002792 > NM_001003669 > NM_001003914 > NM_001003930 > NM_001004138 > NM_001013392 > NM_001013793 > NM_001013800 > NM_001013808 > NM_001014425 > NM_001024731 > NM_001024838 > NM_001024839 > NM_001024840 > NM_001024849 > NM_001025208 > NM_001025241 > NM_001030310 > NM_001033001 > NM_001033127 > NM_001033151 > NM_001033160 > NM_001033198 > ******* > Thanks, > Shibu > > > ________________________________________ > From: Greg Roe [[email protected]] > Sent: 08 February 2011 19:56 > To: John, Shibu > Cc: [email protected] > Subject: Re: [Genome] About NCBI gene coordinate from Gene ID or refseqID > > Hi Shibu, > > The easiest way to do this would be to use the Table Browser > (http://genome.ucsc.edu/cgi-bin/hgTables). Select the assembly of > interest, mm9 I assume. The select: > > Group: Genes and Gene Prediction Tracks > Track: RefSeq Genes > Table: refGenes > Region: genome > > Then under identifiers click upload list. You'll need to make a list of > all the gene ids, without all the extra data (ex: NM_020501). You'll > need to remove the version numbering as well. So NM_020501.1 should be > shown as NM_020501, without the .1. The east low-tech way to do this > would be to load your data in a spreadsheet using the pipe as a column > delimiter, removing the extra columns, then doing a find and replace to > remove all the version designations, ".1", etc. Then save the gene ids > to a text file and upload that. > > Then set the output format to "selected fields from primary and related > tables, choose desired file type returned, and click "get output". > > On the subsequent screen, check the boxes next to "name", "txStart", and > "txStop". Click "get output" and you should have the data you need. > > Now, there are about 28,150 rows in that data set. You may have more > refSeq ids because UCSC only displays Accession types that start with > NM_ and NR_. There are several other types, see: > http://www.ncbi.nlm.nih.gov/projects/RefSeq/key.html#accessions. So we > only display a subset of the total. > > Hope that helps! > > Just email the genome list if you have any additional questions. > > - > Greg Roe > UCSC Genome Browser Group > > > > > On 2/6/11 1:31 PM, John, Shibu wrote: >> Hi, >> >> I have a list of (36692) NCBI refSeq id in the following format. (Mouse, >> downloaded on May 2009 ) >> ***** >> gi|10048421|ref|NM_020488.1| >> gi|10048425|ref|NM_020501.1| >> **** >> Is there any way to get the chromosome start, end position of these geneID? >> ( chr6 131636578 131637481 gi|10048425|ref|NM_020501.1|) >> >> I tried to intersect with the "NM_" id with UCSC >> "http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/refGene.txt.gz" >> ******** >> gi|10048425|ref|NM_020501.1| Tas2r105 NM_020501 chr6 - >> 131636578 131637481 >> ******** >> >> But this "refGene.txt" contains only 28108 id's.. >> >> And I tried to find the gene with Entrez batch finder .. >> ******* >> Id=XM_915912: This record was removed as a result of standard genome >> annotation processing. See the genome build documentation at >> http://www.ncbi.nlm.nih.gov/genome/guide/build.html for further information, >> or contact [email protected]. >> Id=XM_912174: This record was replaced or removed. >> ........ >> ....... >> Received lines: 36692 >> Rejected lines: 17 >> Removed duplicates: 0 >> Passed to Entrez: 36675 >> ********* >> >> It will be a great help that if you can help me to get the chromosome start >> position of these genes or corresponding ENSEMBL gene ID. >> >> Thanks, >> Shibu >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
