Hello Aaron, The refFlat annotation file lists the coordinates of both the RefSeq and Genomic sequence that are involved in the annotation alignment. The RefMrna sequence data comes directly from Genbank. We use BLAT to align the sequence. In many cases, the entire RefSeq sequence will align, including the UTR regions, but in some cases portions of the sequence may not align (do not match the backbone genomic).
I suggest that you use the Table Browser to download sequence based on our annotation alignment coordinates. We do not store this particular sequence data as a pre-computed flat file in the downloads area - only the original RefSeq sequence as you have noticed. To do this, go to the main browser web page and follow these instructions: 1. http://genome.ucsc.edu/ 2. click on "Tables" in the top blue bar or "Table Browser" in the side blue bar. 3. set clade/genome/assembly as desired. 4. set group to "Genes and Gene Prediction Tracks". 5. set track to "RefSeq Genes". 6. set table to "refGene". At this stage, you can "view table schema" for file contents. This works for any table in our database. 7. set region to "genome" for the entire assembly, "ENCODE" for regions with ENCODE annotation, or specify a genomic range. 8. at this point, you can also apply some filters by identifiers (sequence/gene names), table feature filter, or intersection (base overlap) with another track, including your own custom tracks with positional information. 9. set output format as "sequence". 10. name the file and the result will download (highly recommended if the result will be for more than a few sequences). 11. Submit. You will have the choice of getting the genomic, protein, or mRNA sequence based on the alignment coordinates. Choose mRNA for the RefSeq sequence. Some helpful links: http://genome.ucsc.edu/cgi-bin/hgTables http://genome.ucsc.edu/cgi-bin/hgTables#Help http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html http://genome.ucsc.edu/FAQ/FAQdownloads#download32 Any more question, please let us know, Jennifer Jackson UCSC Genome Bioinformatics Group Aaron Skewes wrote: > Hi Jennifer, > Thank you for all you help. The RefMrna will be helpful, but what I need now > is a way to extract only the exons. Is there not a refSeq that corresponds > to you refFlat annotation? I was suggested to do this by someone who has > been in this business for many years. I fear that is your refFlat > coordinates are not compatible with the refSeq for that organism/chromosome > it is a serious misunderstanding and will discourage researchers from using > you annotations. Can you please clarify this for me. > > Thanks, > Aaron > > -----Original Message----- > From: Jennifer Jackson [mailto:[email protected]] > Sent: Monday, February 16, 2009 3:42 PM > To: Aaron Skewes > Cc: [email protected] > Subject: Re: [Genome] refFlat feature locations do not correspond to > nucleotide position > > Hi Aaron, > An easier way of getting all of the RefSeq sequence is this file from > the downloads area /goldenPath/hg18/bigZips/ > > refMrna.fa.gz - RefSeq mRNA from the same species as the genome. > > This sequence data is updated once a week via automatic GenBank > updates. > > > May make it easier than tracking through multiple external files, > Jennifer Jackson > UCSC Genome Bioinformatics Group > > Aaron Skewes wrote: > >> Hi, >> >> >> >> I am attempting to extract the nucleotide sequences for exons in several >> genomes based on their locations listed in the refFlat.txt. In almost all >> cases, the exonStarts-exonEnds do not correspond to the nucleotide >> > position > >> relative to the refSeq for that particular organism and chromosome. For >> example, mouse build37 has a 30Mbp gap at the start of all chromosomes, >> except for Y. This gap is shown in the sequence with "N" but that is >> > omitted > >> from the refFlat table. In other words, nucleotide position 30x10^6 + 1 = >> position 0 in the refFlat. In chicken (and others), there are gaps >> interspersed throughout many of the assembled chromosomes, shown with "N", >> but refFlat locations are not offset by the gap lengths. >> >> >> >> Can somebody please suggest to me how I can extract genomic features based >> on nucleotide position programmatically, if the refFlat positions do not >> match the nucleotide positions and the offsets are unknown? >> >> >> >> Thank you, >> >> Aaron >> >> >> >> _______________________________________________ >> Genome maillist - [email protected] >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> >> > > _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
