Hello Dylan,

Here are a couple of related previously-answered questions that are 
similar to yours:
http://www.soe.ucsc.edu/pipermail/genome/2008-April/016256.html
http://www.soe.ucsc.edu/pipermail/genome/2007-February/012843.html

Note that you can search the mailing list archives on this page:
http://genome.ucsc.edu/FAQ/ and browse or search from this page: 
http://genome.ucsc.edu/contacts.html .

I will also try to answer each of your questions:

(1) The refMrna.fa.gz file from this page (assuming you are using the 
latest human database): 
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
contains mRNA sequences from the Reference Sequence collection at NCBI 
(http://www.ncbi.nlm.nih.gov/RefSeq/).  It is updated once a week.  The 
sequences in this file are incorporated into a track at UCSC called 
"RefSeq Genes".  To see how the track is created, click on the track 
name on the main Genome Browser page 
(http://genome.ucsc.edu/cgi-bin/hgTracks).  In the methods section you 
will see:

RefSeq RNAs were aligned against the human genome using blat; those with 
an alignment of less than 15% were discarded. When a single RNA aligned 
in multiple places, the alignment having the highest base identity was 
identified. Only alignments having a base identity level within 0.1% of 
the best and at least 96% base identity with the genomic sequence were kept.

(2) No.  Not all of the sequence in refMrna.fa.gz is aligned to the 
reference genome by blat.

(3) Click on the "View table schema" link from the track details page, 
or select the table in the Table Browser and hit the "describe table 
schema" button.

(4) Sometimes sequence from the refMrna.fa.gz file aligns to the genome 
more than once -- see the methods section of RefSeq Genes.

Since you are new to the Genome Browser, you might be interested in the 
online Genome Browser tutorials from Open Helix:
http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml

Good luck with your research.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 11/13/08 17:01, Dylan Bobby wrote:
> Hi, 
> 

> I'm trying to understand the precise relationship between the RefSeq
annotations in refGene.txt and the sequences in refSeq_mRNA.fa. I am new
to mammalian genomes, RefSeq and the UCSC browser, so I think I just
have some basic misunderstanding and maybe answers to these questions
will help clear it up:
> 
> (1) Are the refGene.txt and refSeq_mRNA.fa files synced with one
> another? 

(2) If I was to splice together all the exons annotated in
> refGene.txt  for each RefSeq, would I cover all the sequence found in 
> refSeq_mRNA.fa
or would there still be some extra sequence in refSeq_mRNA.fa? If so,
what is the extra sequence?
> (3) Is there a description of the columns in refGene.txt available? I
> 
would like to better understand that table.
> (4) Why are there multiple rows present for some RefSeq Ids in
refGene.txt?
> (5) If these two files aren't meant to correspond, is there another
sequence file that corresponds better with the annotations in refGene.txt?
> 
> Thanks!
> 
> 
> 
>       
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to