*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
Bottomley, Matthew wrote:
*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
Hello,
could anyone tell me how to get the single-letter amino acid sequence out of
a PDB file, e.g. in fasta format?
If you have (g)awk and a fortran compiler, you can:
cat a.pdb | awk '$1~/ATOM/ && $3 ~ /CA/ {print $4}' \
| $for/seqconv > a.seq
where $for/seqconv is compiled from the code below:
character*3 code3(23),A
character*1 code1(23),B(60)
data code1 /'A','V','L','I','P','F','W',
. 'M','G','S','T','C','Y',
. 'N','Q','D','E','K','R','H','B','Z','X'/
data code3 /'ALA','VAL','LEU','ILE','PRO',
. 'PHE','TRP','MET','GLY','SER','THR','CYS',
. 'TYR','ASN','GLN','ASP','GLU','LYS','ARG',
. 'HIS','ASX','GLX','XXX'/
50 do 90 i=1,60
read(5,51,err=100) a
51 FORMAT(A3)
do 90 j=1,23
if (a.eq.code3(j)) b(i)=code1(j)
90 continue
WRITE(6,*) (b(i),i=1,60)
goto 50
100 n=i-1
WRITE(6,*) (b(i),i=1,n)
end