Hello Dongyan,

The refFlat table (and corresponding txt file on the download server),
contains native refSeqs that align one or more times to the
corresponding genome.  If we were unable to align a RefSeq gene to the 
reference assembly using BLAT, it will not be in this file.  If a RefSeq 
gene aligned more than once to the reference assembly, it may be in this 
file more than once.  See the "methods" section of the RefSeq Genes 
track for details.

Also see this previously-answered question for a discussion of the 
refFlat table:
http://www.soe.ucsc.edu/pipermail/genome/2008-June/016588.html

You may be interested in downloading the RefSeq Gene sequences from our 
downloads page, here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/

Look for this file:

refMrna.fa.gz - RefSeq mRNA from the same species as the genome.
     This sequence data is updated once a week via automatic GenBank
     updates.

I hope this information is helpful.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Dongyan Song wrote:
> Hello,
> 
> I am trying to extract all sequences in human being from refseq_rna  
> database in NCBI, and I need all the accession numbers or gi_no. of  
> all genes in Homo sapiens. But when I get a list from NCBI, which  
> contains 38 725mRNA, it differ from the number given by refFlat.txt,  
> which is 27 090. So, I wonder what is the differences between refFlat  
> and gi_list (given by ncbi through searching like nucleotides, '"Homo  
> sapiens"[porgn:__txid9606]'AND'biomol mRNA').
> 
> Thank you very much!
> Best regards,
> Dongyan
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to