Hi Aaron,
Thank you for sending your examples. I do not see the discrepancy 
between the refFlat file, the chrom fasta sequences, and what is 
displayed in the browser for each.

In Mouse July 2007 (mm9, NCBI Build 37), bases 1-3,000,000 of chr1 are 
annotated as "N". This represents the (estimated) telomeric region. The 
Table Browser can be used to determine the exact length for any given 
assembly. The links in this previous thread provide instructions for how 
to do this:
http://www.soe.ucsc.edu/pipermail/genome/2008-July/016798.html

The coordinates for your mouse RefSeq example sequence, NM_013715, are 
identical between the mm9 genome assembly browser and the refFlat 
table/file. Your example from Chicken is also consistent. You do not 
need to adjust alignment coordinates to compensate for any "N" regions 
in the base chromosome. The only coordinate adjustments you may require are
1) to interpret alignments on the negative strand correctly. Here is a 
link to a thread explaining: 
http://www.soe.ucsc.edu/pipermail/genome/2007-September/014688.html
2) to interpret the the zero-based start coordinate correctly.  Here is 
a link to our FAQ explaining: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1

I hope this helps to clarify the data.
Please let us know if you need any additional help/information or if I 
misinterpreted your question,
Jennifer Jackson
UCSC Genome Bioinformatics Group

Aaron Skewes wrote:
> Hi,
>
>  
>
> I am attempting to extract the nucleotide sequences for exons in several
> genomes based on their locations listed in the refFlat.txt. In almost all
> cases, the exonStarts-exonEnds do not correspond to the nucleotide position
> relative to the refSeq for that particular organism and chromosome. For
> example, mouse build37 has a 30Mbp gap at the start of all chromosomes,
> except for Y. This gap is shown in the sequence with "N" but that is omitted
> from the refFlat table. In other words, nucleotide position 30x10^6 + 1 =
> position 0 in the refFlat. In chicken (and others), there are gaps
> interspersed throughout many of the assembled chromosomes, shown with "N",
> but refFlat locations are not offset by the gap lengths.
>
>  
>
> Can somebody please suggest to me how I can extract genomic features based
> on nucleotide position programmatically, if the refFlat positions do not
> match the nucleotide positions and the offsets are unknown? 
>
>  
>
> Thank you,
>
> Aaron
>
>  
>
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to