Re: [ccp4bb] Correcting 3-letter codes based on protonation states in a PDB file

2017-06-29 Thread Jared Sampson
Thanks to Dave, Tristan and David for the suggestions.  

The code Tristan posted is similar to what I had been thinking about doing, and 
probably will do in the future, although not being familiar with ChimeraX, I'll 
likely end up using Biopython.  

For this particular round, for expediency (and since it's been a while since I 
really used Biopython), I just went through the file residue-by-residue, in 
brute-force fashion using grep and Vim.  It wasn't as bad as I initially 
thought.

First I got a list of the side chain hydrogens in question:

grep -E "HD1 HIS|HE2 HIS" my_structure.pdb > his.txt

Then for each His residue, I checked the text file to se whether HD1 or HE2 or 
both were present, and modified the residue code with a substitution in Vim:

:%s /HIS A 251/HIP A 251/
etc.

It only took me about 15 minutes to do this for His/Asp/Glu for 2 PDBs, which 
is somewhat faster than I could have re-learned how to do it in one of the 
(clearly more appropriate) scripting languages.  Not exactly elegant, but it 
got the job done.  Of course, if it looks like I'll have to do this more 
frequently in the future, I'll likely go the scripting route next time.

Thanks everyone!

Cheers,
Jared

 

> On Jun 29, 2017, at 4:46 AM, Tristan Croll <ti...@cam.ac.uk> wrote:
> 
> This can be done in a few lines of script in any structural biology package 
> that provides a Python (or other) shell. Here's how I'd do it in ChimeraX, 
> for example:
> 
> Assuming your model is the only one loaded and atom names are all standard:
> 
> m = session.models.list()[0]
> histidines = m.residues.filter(m.residues.names == 'HIS')
> for his in histidines:
>names = his.atoms.names
>he2 = 'HE2' in names
>hd1 = 'HD1' in names
>if hd1 and not he2:
>his.name = 'HID'
>elif he2 and not hd1:
>his.name = 'HIE'
>elif hd1 and he2:
>his.name = 'HIP'
>else:
>raise RuntimeError('HIS {}:{} is missing both hydrogens!'.format(
>h.chain_id, h.number))
> 
> Cheers,
> 
> Tristan
> 
> On 2017-06-29 07:09, Briggs, David C wrote:
>> I believe the ProPka or Pdb2pqr webservers can do this.
>> ProPka.org <http://propka.org/>
>> http:// [1]nbcr [1]-222.ucsd.edu/pdb2pqr_2.0.0/ 
>> <http://222.ucsd.edu/pdb2pqr_2.0.0/> [1]
>> HTH,
>> Dave
>> --
>> Dr David C Briggs
>> Hohenester Lab
>> Department of Life Sciences
>> Imperial College London
>> UK
>> http://about.me/david_briggs <http://about.me/david_briggs> [2]
>> -----
>> FROM: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of
>> Sampson, Jared <jms2...@cumc.columbia.edu>
>> SENT: Wednesday, June 28, 2017 11:34:05 PM
>> TO: CCP4BB@JISCMAIL.AC.UK
>> SUBJECT: [ccp4bb] Correcting 3-letter codes based on protonation
>> states in a PDB file
>> Dear all -
>> I'm working with a PDB file with explicit hydrogens where many of the
>> histidines are in protonated form due to crystallization at low pH.
>> Unfortunately, although the additional protons are present in the
>> model for the positively charged histidines, the residues in question
>> are indicated in both the SEQRES and the ATOM records as 3-letter code
>> `HIS` regardless of protonation state (i.e. instead of `HIP` for
>> positively charged, and `HID` or `HIE` for the neutral tautomers).
>> Are there existing tools available to determine the proper 3-letter
>> residue code for titratable amino acid residues based on which
>> hydrogens are present, and output a corrected PDB file?
>> Thank you in advance for your suggestions.
>> Cheers,
>> Jared Sampson
>> Ph.D. Candidate
>> Columbia University
>> Links:
>> --
>> [1] http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/ 
>> <http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/>
>> [2] http://about.me/david_briggs <http://about.me/david_briggs>


Re: [ccp4bb] Correcting 3-letter codes based on protonation states in a PDB file

2017-06-29 Thread Tristan Croll
This can be done in a few lines of script in any structural biology 
package that provides a Python (or other) shell. Here's how I'd do it in 
ChimeraX, for example:


Assuming your model is the only one loaded and atom names are all 
standard:


m = session.models.list()[0]
histidines = m.residues.filter(m.residues.names == 'HIS')
for his in histidines:
names = his.atoms.names
he2 = 'HE2' in names
hd1 = 'HD1' in names
if hd1 and not he2:
his.name = 'HID'
elif he2 and not hd1:
his.name = 'HIE'
elif hd1 and he2:
his.name = 'HIP'
else:
raise RuntimeError('HIS {}:{} is missing both 
hydrogens!'.format(

h.chain_id, h.number))

Cheers,

Tristan

On 2017-06-29 07:09, Briggs, David C wrote:

I believe the ProPka or Pdb2pqr webservers can do this.

 ProPka.org

 http:// [1]nbcr [1]-222.ucsd.edu/pdb2pqr_2.0.0/ [1]

 HTH,

 Dave

 --

 Dr David C Briggs

 Hohenester Lab

 Department of Life Sciences

 Imperial College London

 UK

 http://about.me/david_briggs [2]

-

FROM: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of
Sampson, Jared <jms2...@cumc.columbia.edu>
 SENT: Wednesday, June 28, 2017 11:34:05 PM
 TO: CCP4BB@JISCMAIL.AC.UK
 SUBJECT: [ccp4bb] Correcting 3-letter codes based on protonation
states in a PDB file

Dear all -

 I'm working with a PDB file with explicit hydrogens where many of the
histidines are in protonated form due to crystallization at low pH.
Unfortunately, although the additional protons are present in the
model for the positively charged histidines, the residues in question
are indicated in both the SEQRES and the ATOM records as 3-letter code
`HIS` regardless of protonation state (i.e. instead of `HIP` for
positively charged, and `HID` or `HIE` for the neutral tautomers).

 Are there existing tools available to determine the proper 3-letter
residue code for titratable amino acid residues based on which
hydrogens are present, and output a corrected PDB file?

 Thank you in advance for your suggestions.

 Cheers,

 Jared Sampson
 Ph.D. Candidate
 Columbia University

Links:
--
[1] http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/
[2] http://about.me/david_briggs


Re: [ccp4bb] Correcting 3-letter codes based on protonation states in a PDB file

2017-06-29 Thread Briggs, David C
I believe the ProPka or Pdb2pqr webservers can do this.

ProPka.org

http://<http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/>nbcr<http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/>-222.ucsd.edu/pdb2pqr_2.0.0/<http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/>

HTH,

Dave

--
Dr David C Briggs
Hohenester Lab
Department of Life Sciences
Imperial College London
UK
http://about.me/david_briggs


From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of Sampson, Jared 
<jms2...@cumc.columbia.edu>
Sent: Wednesday, June 28, 2017 11:34:05 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Correcting 3-letter codes based on protonation states in a 
PDB file

Dear all -

I'm working with a PDB file with explicit hydrogens where many of the 
histidines are in protonated form due to crystallization at low pH.  
Unfortunately, although the additional protons are present in the model for the 
positively charged histidines, the residues in question are indicated in both 
the SEQRES and the ATOM records as 3-letter code `HIS` regardless of 
protonation state (i.e. instead of `HIP` for positively charged, and `HID` or 
`HIE` for the neutral tautomers).

Are there existing tools available to determine the proper 3-letter residue 
code for titratable amino acid residues based on which hydrogens are present, 
and output a corrected PDB file?

Thank you in advance for your suggestions.

Cheers,

Jared Sampson
Ph.D. Candidate
Columbia University


[ccp4bb] Correcting 3-letter codes based on protonation states in a PDB file

2017-06-28 Thread Sampson, Jared
Dear all - 

I'm working with a PDB file with explicit hydrogens where many of the 
histidines are in protonated form due to crystallization at low pH.  
Unfortunately, although the additional protons are present in the model for the 
positively charged histidines, the residues in question are indicated in both 
the SEQRES and the ATOM records as 3-letter code `HIS` regardless of 
protonation state (i.e. instead of `HIP` for positively charged, and `HID` or 
`HIE` for the neutral tautomers).  

Are there existing tools available to determine the proper 3-letter residue 
code for titratable amino acid residues based on which hydrogens are present, 
and output a corrected PDB file?

Thank you in advance for your suggestions.

Cheers,

Jared Sampson
Ph.D. Candidate
Columbia University