Please note this off-by-one correction:
awk '{print $1,$4-1,$5,$3}' yourFile.gtf > coordinates.bed
And if you are obtaining GTF files from the table browser,
it would be much easier instead to allow the table browser to
supply you with the correct BED file format without the awkward
error-prone conversion to GTF format.
Thank you Angie for the noted corrections.
--Hiram
Hiram Clawson wrote:
> Good Morning Sudeep:
>
> Use the .2bit file, it has all the sequence and is masked.
> Select out the coordinates (columns 1,4,5) from your GTF file to make
> a bed file:
>
> awk '{print $1,$4,$5,$3}' yourFile.gtf > coordinates.bed
>
> Then use the twoBitToFa program to extract those sequences from the
> .2bit file
>
> twoBitToFa -bed=coordinates.bed susScr2.2bit result.fa
>
> --Hiram
>
> sudeep s wrote:
>> Dear Mailing list,
>>
>> I have a list of interesting nucleotide co-ordinate positions from a
>> UCSC GTF file (Organism: Sus scrofa, build: Sscrofa9.2). Now want the
>> corresponding sequences for those co - ordinates, since my list would
>> be a little bit big, I plan to download the chromosome sequences and
>> get the sequences through ad hoc scripting or EMBOSS tools. But when I
>> look at the sequence & annotation download page
>> (http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/), I see
>> several chromosomal fasta files. In that case, which chromosomal fasta
>> file should I download to get the correct nucleotides for the
>> positions in the GTF file ? ie should I download the chromosome
>> assembly sequence file (chromFa) or the repeat masked (chromFaMasked)
>>
>> Thank you in advance.
>>
>> Sudeep.
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome