Hi Brooke, 


Very helpful response.  I apologize for being somewhat redundant with
previous questions, but I did not come across the emails you listed in my 
searches.  



In response to (2) then, is it possible to get a list of all the
RefSeqs for which you successfully aligned 100% of sequence to the genome? 
Ideally I would like to work with just that subset of sequences.  For now I 
just need this list for human, but a generalized approach would be better long 
term.

Thanks!

--- On Fri, 11/14/08, Brooke Rhead <[EMAIL PROTECTED]> wrote:
From: Brooke Rhead <[EMAIL PROTECTED]>
Subject: Re: [Genome] Relationship between refGene.txt and refMrna.fa.gz?
To: [EMAIL PROTECTED]
Cc: [email protected]
Date: Friday, November 14, 2008, 7:47 PM

Hello Dylan,

Here are a couple of related previously-answered questions that are similar to
yours:
http://www.soe.ucsc.edu/pipermail/genome/2008-April/016256.html
http://www.soe.ucsc.edu/pipermail/genome/2007-February/012843.html

Note that you can search the mailing list archives on this page:
http://genome.ucsc.edu/FAQ/ and browse or search from this page:
http://genome.ucsc.edu/contacts.html .

I will also try to answer each of your questions:

(1) The refMrna.fa.gz file from this page (assuming you are using the latest
human database): http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
contains mRNA sequences from the Reference Sequence collection at NCBI
(http://www.ncbi.nlm.nih.gov/RefSeq/).  It is updated once a week.  The
sequences in this file are incorporated into a track at UCSC called "RefSeq
Genes".  To see how the track is created, click on the track name on the
main Genome Browser page (http://genome.ucsc.edu/cgi-bin/hgTracks).  In the
methods section you will see:

RefSeq RNAs were aligned against the human genome using blat; those with an
alignment of less than 15% were discarded. When a single RNA aligned in multiple
places, the alignment having the highest base identity was identified. Only
alignments having a base identity level within 0.1% of the best and at least 96%
base identity with the genomic sequence were kept.

(2) No.  Not all of the sequence in refMrna.fa.gz is aligned to the reference
genome by blat.

(3) Click on the "View table schema" link from the track details
page, or select the table in the Table Browser and hit the "describe table
schema" button.

(4) Sometimes sequence from the refMrna.fa.gz file aligns to the genome more
than once -- see the methods section of RefSeq Genes.

Since you are new to the Genome Browser, you might be interested in the online
Genome Browser tutorials from Open Helix:
http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml

Good luck with your research.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 11/13/08 17:01, Dylan Bobby wrote:
> Hi, 

> I'm trying to understand the precise relationship between the RefSeq
annotations in refGene.txt and the sequences in refSeq_mRNA.fa. I am new
to mammalian genomes, RefSeq and the UCSC browser, so I think I just
have some basic misunderstanding and maybe answers to these questions
will help clear it up:
> 
> (1) Are the refGene.txt and refSeq_mRNA.fa files synced with one
> another? 

(2) If I was to splice together all the exons annotated in
> refGene.txt  for each RefSeq, would I cover all the sequence found in
refSeq_mRNA.fa
or would there still be some extra sequence in refSeq_mRNA.fa? If so,
what is the extra sequence?
> (3) Is there a description of the columns in refGene.txt available? I
> 
would like to better understand that table.
> (4) Why are there multiple rows present for some RefSeq Ids in
refGene.txt?
> (5) If these two files aren't meant to correspond, is there another
sequence file that corresponds better with the annotations in refGene.txt?
> 
> Thanks!
> 
> 
> 
>       _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome



      
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to