Hello Aaron,
The refFlat annotation file lists the coordinates of both the RefSeq and 
Genomic sequence that are involved in the annotation alignment. The 
RefMrna sequence data comes directly from Genbank. We use BLAT to align 
the sequence. In many cases, the entire RefSeq sequence will align, 
including the UTR regions, but in some cases portions of the sequence 
may not align (do not match the backbone genomic).

I suggest that you use the Table Browser to download sequence based on 
our annotation alignment coordinates. We do not store this particular 
sequence data as a pre-computed flat file in the downloads area - only 
the original RefSeq sequence as you have noticed. To do this, go to the 
main browser web page and follow these instructions:

1. http://genome.ucsc.edu/
2. click on "Tables" in the top blue bar or "Table Browser" in the side 
blue bar.
3. set clade/genome/assembly as desired.
4. set group to "Genes and Gene Prediction Tracks".
5. set track to "RefSeq Genes".
6. set table to "refGene". At this stage, you can "view table schema" 
for file contents. This works for any table in our database.
7. set region to "genome" for the entire assembly, "ENCODE" for regions 
with ENCODE annotation, or specify a genomic range.
8. at this point, you can also apply some filters by identifiers 
(sequence/gene names), table feature filter, or intersection (base 
overlap) with another track, including your own custom tracks with 
positional information.
9. set output format as "sequence".
10. name the file and the result will download (highly recommended if 
the result will be for more than a few sequences).
11. Submit. You will have the choice of getting the genomic, protein, or 
mRNA sequence based on the alignment coordinates. Choose mRNA for the 
RefSeq sequence.

Some helpful links:
http://genome.ucsc.edu/cgi-bin/hgTables
http://genome.ucsc.edu/cgi-bin/hgTables#Help
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
http://genome.ucsc.edu/FAQ/FAQdownloads#download32

Any more question, please let us know,
Jennifer Jackson
UCSC Genome Bioinformatics Group

Aaron Skewes wrote:
> Hi Jennifer,
> Thank you for all you help. The RefMrna will be helpful, but what I need now
> is a way to extract only the exons. Is there not a refSeq that corresponds
> to you refFlat annotation? I was suggested to do this by someone who has
> been in this business for many years. I fear that is your refFlat
> coordinates are not compatible with the refSeq for that organism/chromosome
> it is a serious misunderstanding and will discourage researchers from using
> you annotations. Can you please clarify this for me.
>
> Thanks,
> Aaron
>
> -----Original Message-----
> From: Jennifer Jackson [mailto:[email protected]] 
> Sent: Monday, February 16, 2009 3:42 PM
> To: Aaron Skewes
> Cc: [email protected]
> Subject: Re: [Genome] refFlat feature locations do not correspond to
> nucleotide position
>
> Hi Aaron,
> An easier way of getting all of the RefSeq sequence is this file from 
> the downloads area /goldenPath/hg18/bigZips/
>
> refMrna.fa.gz - RefSeq mRNA from the same species as the genome.
>
>     This sequence data is updated once a week via automatic GenBank 
>     updates.
>
>
> May make it easier than tracking through multiple external files,
> Jennifer Jackson
> UCSC Genome Bioinformatics Group
>
> Aaron Skewes wrote:
>   
>> Hi,
>>
>>  
>>
>> I am attempting to extract the nucleotide sequences for exons in several
>> genomes based on their locations listed in the refFlat.txt. In almost all
>> cases, the exonStarts-exonEnds do not correspond to the nucleotide
>>     
> position
>   
>> relative to the refSeq for that particular organism and chromosome. For
>> example, mouse build37 has a 30Mbp gap at the start of all chromosomes,
>> except for Y. This gap is shown in the sequence with "N" but that is
>>     
> omitted
>   
>> from the refFlat table. In other words, nucleotide position 30x10^6 + 1 =
>> position 0 in the refFlat. In chicken (and others), there are gaps
>> interspersed throughout many of the assembled chromosomes, shown with "N",
>> but refFlat locations are not offset by the gap lengths.
>>
>>  
>>
>> Can somebody please suggest to me how I can extract genomic features based
>> on nucleotide position programmatically, if the refFlat positions do not
>> match the nucleotide positions and the offsets are unknown? 
>>
>>  
>>
>> Thank you,
>>
>> Aaron
>>
>>  
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>   
>>     
>
>   
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to