Good Morning Sudeep:
Use the .2bit file, it has all the sequence and is masked.
Select out the coordinates (columns 1,4,5) from your GTF file to make
a bed file:
awk '{print $1,$4,$5,$3}' yourFile.gtf > coordinates.bed
Then use the twoBitToFa program to extract those sequences from the .2bit file
twoBitToFa -bed=coordinates.bed susScr2.2bit result.fa
--Hiram
sudeep s wrote:
> Dear Mailing list,
>
> I have a list of interesting nucleotide co-ordinate positions from a UCSC
> GTF file (Organism: Sus scrofa, build: Sscrofa9.2). Now want the
> corresponding sequences for those co - ordinates, since my list would be a
> little bit big, I plan to download the chromosome sequences and get the
> sequences through ad hoc scripting or EMBOSS tools. But when I look at the
> sequence & annotation download page
> (http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/), I see several
> chromosomal fasta files. In that case, which chromosomal fasta file should I
> download to get the correct nucleotides for the positions in the GTF file ?
> ie should I download the chromosome assembly sequence file (chromFa) or the
> repeat masked (chromFaMasked)
>
> Thank you in advance.
>
> Sudeep.
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome