***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***




Bottomley, Matthew wrote:
***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***


Hello,
could anyone tell me how to get the single-letter amino acid sequence out of
a PDB file, e.g. in fasta format?

If you have (g)awk and a fortran compiler, you can:

cat a.pdb | awk '$1~/ATOM/ && $3 ~ /CA/ {print $4}' \
  | $for/seqconv > a.seq


where $for/seqconv is compiled from the code below:





        character*3 code3(23),A
        character*1 code1(23),B(60)
        data code1 /'A','V','L','I','P','F','W',
     . 'M','G','S','T','C','Y',
     . 'N','Q','D','E','K','R','H','B','Z','X'/
        data code3 /'ALA','VAL','LEU','ILE','PRO',
     . 'PHE','TRP','MET','GLY','SER','THR','CYS',
     . 'TYR','ASN','GLN','ASP','GLU','LYS','ARG',
     . 'HIS','ASX','GLX','XXX'/
        
50      do 90 i=1,60
        read(5,51,err=100) a
51      FORMAT(A3)
        do 90 j=1,23
        if (a.eq.code3(j)) b(i)=code1(j)
        
90      continue
        WRITE(6,*) (b(i),i=1,60)
        goto 50
100     n=i-1
        WRITE(6,*) (b(i),i=1,n)
        end
        


Reply via email to