Hello Dongyan, The refFlat table (and corresponding txt file on the download server), contains native refSeqs that align one or more times to the corresponding genome. If we were unable to align a RefSeq gene to the reference assembly using BLAT, it will not be in this file. If a RefSeq gene aligned more than once to the reference assembly, it may be in this file more than once. See the "methods" section of the RefSeq Genes track for details.
Also see this previously-answered question for a discussion of the refFlat table: http://www.soe.ucsc.edu/pipermail/genome/2008-June/016588.html You may be interested in downloading the RefSeq Gene sequences from our downloads page, here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ Look for this file: refMrna.fa.gz - RefSeq mRNA from the same species as the genome. This sequence data is updated once a week via automatic GenBank updates. I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Dongyan Song wrote: > Hello, > > I am trying to extract all sequences in human being from refseq_rna > database in NCBI, and I need all the accession numbers or gi_no. of > all genes in Homo sapiens. But when I get a list from NCBI, which > contains 38 725mRNA, it differ from the number given by refFlat.txt, > which is 27 090. So, I wonder what is the differences between refFlat > and gi_list (given by ncbi through searching like nucleotides, '"Homo > sapiens"[porgn:__txid9606]'AND'biomol mRNA'). > > Thank you very much! > Best regards, > Dongyan > > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
