*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
On Tuesday 03 October 2006 01:00 pm, Bottomley, Matthew wrote:
> could anyone tell me how to get the single-letter amino acid sequence
> out of a PDB file, e.g. in fasta format?
No, but here's a perl script that goes the other way.
It might do just as well for your purposes.
#!/usr/bin/perl -w
#
# Read a FASTA format sequence and convert it to PDB SEQRES records
#
# EAM May 2005
#
my %aacode = (
"A","ALA",
"R","ARG",
"N","ASN",
"D","ASP",
"C","CYS",
"Q","GLN",
"E","GLU",
"G","GLY",
"H","HIS",
"I","ILE",
"L","LEU",
"K","LYS",
"M","MET",
"F","PHE",
"P","PRO",
"S","SER",
"T","THR",
"W","TRP",
"Y","TYR",
"V","VAL"
);
while (<>) {
if (m/^>>/) {next}
chomp;
$seq .= $_;
}
$seq =~ tr/a-z/A-Z/;
$seqlen = length($seq);
$line = 0;
$chain = "A";
print "$seqlen residues\n";
$seq =~ s/./$aacode{$&} /g;
$seq =~ s/(.{4,52})/printf "SEQRES %3d %1s %3d $1\n",
++$line, $chain, $seqlen/ge;
--
Ethan A Merritt
Biomolecular Structure Center
University of Washington, Seattle WA