Hi Iain,

Here's a perlscript that takes as arguments:
clustalw_aln_file reference_seq_name pdb_file min_B max_B
and calculates a consensus score as the fraction most represented amino acid
in a column, where min_B=100% and max_B=0%.
Multiple chains are handled as long as the reference sequence in the
alignment accurately reflects the PDB file sequence, e.g no 'ACE' residues,
waters as HETATMs etc, although allowances could of course easily be made
for this

Cheers,
James

#!/usr/bin/perl

($seqin,$refsq,$pdbin,$min,$max)[EMAIL PROTECTED];
#B-factor column scaling
if ($max<=$min) { $min=0; $max=100 }
#Load CLUSTALW file
open(I,$seqin) || die "Alignment file not found.\n";
$h=1; while (<I>) {
 if ($_=~/CLUSTAL/) {
   $h=0;
 } elsif (($h==0)&&($_=~/^(\w+)\s+([A-Za-z\-]+)/)) {
   $sq{$1}.=$2;
 }
}
close(I);
$n=0;
foreach $s (values %sq) {
 if (length($s) != length($sq{$refsq})) {
   die "Non-identical length or no reference sequence.\n";
 }
 $n++;
}
if ($n<2) {
 die "Too few sequences.\n";
}
#determine consensus score from CLUSTALW file
$n=0;
for ($i=0; $i<length($sq{$refsq}); $i++) {
 if (substr($sq{$refsq},$i,1)=~/\w/) {
   $n++; %aa=();  $t=0; $tn=0;
   foreach $s (values %sq) {
     if (++$aa{($a=substr($s,$i,1))}>$t) {
       if ($a ne "-") { $t=$aa{$a} }
     }
     $tn++;
   }
   $cons[$n]=$t/$tn;
 }
}
#Apply to b-factor column
$n=0; $ore=$nre="";
open(I,$pdbin) || die "PDB file not found.\n";
while (<I>) {
 if ($_=~/^ATOM/) {
   if ((($nre=substr($_,21,5)) ne $ore)) {
     $ore=$nre; $n++;
   }
   #Truncate where residues in structure
   #exceeds residues in alignment
   if ($n<@cons) {
     print substr($_,0,60).
           sprintf("%6.2f",($max-$min)*(1-$cons[$n])+$min).
       substr($_,66);
   }
 } else {
   print $_;
 }
}
close(I);



On 30/01/07, Iain Kerr <[EMAIL PROTECTED]> wrote:

I'm trying to colour a molecular surface by sequence
conservation...(sorry,
I think I incorrectly posted this to COOTBB the other day)

I've figured out how to do it in GRASP - modify the B-factor column in the
PDB file to represent the percentage conservation and then colour the
surface by B-factor. I know ESPRIPT will make the modified file, but I'm
having trouble generating the correct one..

I am providing ESPRIPT (expert mode, %Equivalent' scoring function) with a
CLUSTALW alignment (Aligned sequences > Main alignment file) and a PDB
file
('Aligned sequences' > 'Supplementary pdb' file). I get the error:

'Fatal error: wrong format in PDB file.'

..and the values in the B-factor (%Equivalent) column are all either 99 or
100 which is nonsense according to the alignment.

Has anyone come across this. I don't see anything wrong with my PDB file..

Thanks,
Iain




--
Dr. James Irving
NH&MRC C.J. Martin Fellow
Division of Structural Biology
Wellcome Trust Centre for Human Genetics
Oxford University
Roosevelt Drive,
Oxford OX3 7BN
UK
email: [EMAIL PROTECTED]
phone: +44 1865 287 550

Reply via email to