Hi Aaron,
An easier way of getting all of the RefSeq sequence is this file from
the downloads area /goldenPath/hg18/bigZips/
refMrna.fa.gz - RefSeq mRNA from the same species as the genome.
This sequence data is updated once a week via automatic GenBank
updates.
May make it easier than tracking through multiple external files,
Jennifer Jackson
UCSC Genome Bioinformatics Group
Aaron Skewes wrote:
> Hi,
>
>
>
> I am attempting to extract the nucleotide sequences for exons in several
> genomes based on their locations listed in the refFlat.txt. In almost all
> cases, the exonStarts-exonEnds do not correspond to the nucleotide position
> relative to the refSeq for that particular organism and chromosome. For
> example, mouse build37 has a 30Mbp gap at the start of all chromosomes,
> except for Y. This gap is shown in the sequence with "N" but that is omitted
> from the refFlat table. In other words, nucleotide position 30x10^6 + 1 =
> position 0 in the refFlat. In chicken (and others), there are gaps
> interspersed throughout many of the assembled chromosomes, shown with "N",
> but refFlat locations are not offset by the gap lengths.
>
>
>
> Can somebody please suggest to me how I can extract genomic features based
> on nucleotide position programmatically, if the refFlat positions do not
> match the nucleotide positions and the offsets are unknown?
>
>
>
> Thank you,
>
> Aaron
>
>
>
> _______________________________________________
> Genome maillist - [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist - [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome