***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***


On Tuesday 03 October 2006 01:00 pm, Bottomley, Matthew wrote:
> could anyone tell me how to get the single-letter amino acid sequence
> out of a PDB file, e.g. in fasta format?

No, but here's a perl script that goes the other way.
It might do just as well for your purposes.

#!/usr/bin/perl -w
#
# Read a FASTA format sequence and convert it to PDB SEQRES records
#
# EAM May 2005
#

my %aacode = (
 "A","ALA", 
 "R","ARG", 
 "N","ASN", 
 "D","ASP", 
 "C","CYS", 
 "Q","GLN", 
 "E","GLU", 
 "G","GLY", 
 "H","HIS", 
 "I","ILE", 
 "L","LEU", 
 "K","LYS", 
 "M","MET", 
 "F","PHE", 
 "P","PRO", 
 "S","SER", 
 "T","THR", 
 "W","TRP", 
 "Y","TYR", 
 "V","VAL" 
 );

while (<>) {
  if (m/^>>/) {next}
  chomp;
  $seq .= $_;
}
  $seq =~ tr/a-z/A-Z/;
  $seqlen = length($seq);
  $line = 0;
  $chain = "A";
  print "$seqlen residues\n";
  $seq =~ s/./$aacode{$&} /g;
  $seq =~ s/(.{4,52})/printf "SEQRES %3d %1s  %3d  $1\n", 
                            ++$line, $chain, $seqlen/ge;


-- 
Ethan A Merritt
Biomolecular Structure Center
University of Washington, Seattle WA

Reply via email to