Hi Aaron,
Thank you for sending more data. I think that the confusion is that you 
are extracting genomic sequence based on RefSeq alignment coordinates, 
not the RefSeq sequence itself.

For your chicken example, please go into the chicken assembly browser 
and enter in NM_001031574 for the position/search. In the display 
window, click on the sequence name LOC429451. This will lead you to a 
sequence description page (where NM_001031574 is shown as an alternate 
name).

The alignment between the genomic and this sequence is in the section: 
mRNA/Genomic Alignments. Click on "View details of parts of alignment 
within browser window." From here, you can see the side-by-side alignments.

Try the Table Browser using the output format "sequence" and the track 
RefSeq Genes to exact the RefSeq sequence instead of genomic. To do 
this, select the genome, track, main table refGene, and limit by genomic 
location or add in some identifiers. Set output to "sequence". Click on 
"get output". You will then be allowed to select either genomic (what 
the chromosome fasta files represent), protein (translation using 
predicted or known frame), or mRNA. I would think that mRNA is the 
output that you want.

You can then download the appropriate fasta files to obtain RefSeq 
sequence without using the Table Browser (or after you can see their 
content and relationships using the TB "view table schema" tool). 
Suggested files: gbSeq, gbExtFile.txt and the files that gbExtFile.txt 
point to, for example 
/gbdb/genbank/./data/processed/refseq.33/daily.2009.0124/mrna.fa.

Try this and let me know if this clears things up,
Jennifer Jackson
UCSC Genome Bioinformatics Group


Aaron Skewes wrote:
> Hi,
>
>  
>
> I am attempting to extract the nucleotide sequences for exons in several
> genomes based on their locations listed in the refFlat.txt. In almost all
> cases, the exonStarts-exonEnds do not correspond to the nucleotide position
> relative to the refSeq for that particular organism and chromosome. For
> example, mouse build37 has a 30Mbp gap at the start of all chromosomes,
> except for Y. This gap is shown in the sequence with "N" but that is omitted
> from the refFlat table. In other words, nucleotide position 30x10^6 + 1 =
> position 0 in the refFlat. In chicken (and others), there are gaps
> interspersed throughout many of the assembled chromosomes, shown with "N",
> but refFlat locations are not offset by the gap lengths.
>
>  
>
> Can somebody please suggest to me how I can extract genomic features based
> on nucleotide position programmatically, if the refFlat positions do not
> match the nucleotide positions and the offsets are unknown? 
>
>  
>
> Thank you,
>
> Aaron
>
>  
>
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to