Good Morning Sudeep:

Use the .2bit file, it has all the sequence and is masked.
Select out the coordinates (columns 1,4,5) from your GTF file to make
a bed file:

awk '{print $1,$4,$5,$3}' yourFile.gtf > coordinates.bed

Then use the twoBitToFa program to extract those sequences from the .2bit file

twoBitToFa -bed=coordinates.bed susScr2.2bit result.fa

--Hiram

sudeep s wrote:
> Dear Mailing list,
> 
> I have  a list of interesting nucleotide co-ordinate positions from a UCSC 
> GTF file (Organism: Sus scrofa, build: Sscrofa9.2). Now want the 
> corresponding sequences for those co - ordinates, since my list would be a 
> little bit big, I plan to download the chromosome sequences and get the 
> sequences through ad hoc scripting or EMBOSS tools. But when I look at the 
> sequence & annotation  download  page 
> (http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/), I see several 
> chromosomal fasta files. In that case, which chromosomal fasta file should I 
> download to get the correct nucleotides for the positions in the GTF file ? 
> ie  should I download the chromosome assembly sequence file (chromFa) or the 
> repeat masked (chromFaMasked)
> 
> Thank you in advance.
> 
> Sudeep.
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to