Hello Dylan, Here are a couple of related previously-answered questions that are similar to yours: http://www.soe.ucsc.edu/pipermail/genome/2008-April/016256.html http://www.soe.ucsc.edu/pipermail/genome/2007-February/012843.html
Note that you can search the mailing list archives on this page: http://genome.ucsc.edu/FAQ/ and browse or search from this page: http://genome.ucsc.edu/contacts.html . I will also try to answer each of your questions: (1) The refMrna.fa.gz file from this page (assuming you are using the latest human database): http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ contains mRNA sequences from the Reference Sequence collection at NCBI (http://www.ncbi.nlm.nih.gov/RefSeq/). It is updated once a week. The sequences in this file are incorporated into a track at UCSC called "RefSeq Genes". To see how the track is created, click on the track name on the main Genome Browser page (http://genome.ucsc.edu/cgi-bin/hgTracks). In the methods section you will see: RefSeq RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. (2) No. Not all of the sequence in refMrna.fa.gz is aligned to the reference genome by blat. (3) Click on the "View table schema" link from the track details page, or select the table in the Table Browser and hit the "describe table schema" button. (4) Sometimes sequence from the refMrna.fa.gz file aligns to the genome more than once -- see the methods section of RefSeq Genes. Since you are new to the Genome Browser, you might be interested in the online Genome Browser tutorials from Open Helix: http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml Good luck with your research. -- Brooke Rhead UCSC Genome Bioinformatics Group On 11/13/08 17:01, Dylan Bobby wrote: > Hi, > > I'm trying to understand the precise relationship between the RefSeq annotations in refGene.txt and the sequences in refSeq_mRNA.fa. I am new to mammalian genomes, RefSeq and the UCSC browser, so I think I just have some basic misunderstanding and maybe answers to these questions will help clear it up: > > (1) Are the refGene.txt and refSeq_mRNA.fa files synced with one > another? (2) If I was to splice together all the exons annotated in > refGene.txt for each RefSeq, would I cover all the sequence found in > refSeq_mRNA.fa or would there still be some extra sequence in refSeq_mRNA.fa? If so, what is the extra sequence? > (3) Is there a description of the columns in refGene.txt available? I > would like to better understand that table. > (4) Why are there multiple rows present for some RefSeq Ids in refGene.txt? > (5) If these two files aren't meant to correspond, is there another sequence file that corresponds better with the annotations in refGene.txt? > > Thanks! > > > > > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
