Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record

2012-07-06 Thread Stiefl, Nikolaus
Hi JP,
Not sure if this is of any help. If it's an pdb file from rcsb or an
in-house one where you have a corresponding smiles available maybe you
could use this information to properly setup the bond types using bond
matches? I know the components.cif file still has quite a few errors -
however, maybe it could be of help.
Cheers
Nik


Maybe you could use the smiles information ion

On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote:

Hi JP,

On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote:


 I generate a RWMol instance from the HETATM portion of a PDB file.  My
atoms
 are currently only joined by a single bond as defined in the connect
portion
 of the pdb file, e.g.

 CONECT 2235 2234 2236
 CONECT 2236 2231 2235 2251
 CONECT 2237 2238 2242

ah, yes, the missing bond orders, one of the many reasons that I have
never done a PDB parser for the RDKit. :-S

I think you're doing this work in C++, so I'm going to answer the rest
of the questions accordingly.

 Are there any obvious rdkit ways how to detect :-

 0. rings

Sure.

If you just want to know if each atom/ring is in a ring you can use
MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0
or mol.getRingInfo().numBondRings(idx)0

If you want to know what the SSSR rings are, then you should use
MolOps::symmetrizeSSSR(). You can pass that an extra argument where it
will return the rings as defined by atom indices. After calling this,
you can also get the set of atom rings using
mol.getRingInfo().atomRings() or the bond rings with
mol.getRingInfo().bondRings();

 1. aromatic rings/atoms
 2. double/triple bonds
 3. charges (if any)

Here's where the trouble starts.

I guess you want to perceive the bond types and atom hybridizations
from the geometry. From there you can get the charges. The RDKit does
not currently have anything to do this. There was a discussion on the
mailing list last year:
http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85
where Geoff Hutchinson very kindly offered to donate the OpenBabel
bond perception code to the RDKit. He sent the code, but I've never
had the time to port it from OpenBabel to RDKit. If you're
interested in implementing this and were willing to do it in a way
that could be integrated into the main RDKit, I can send you the
donated code; it's about 300 lines of well-commented C++.


 I would like to set these properties on every atom instance contained
in my
 RWMol - so I generate a correct molecule representation.
 I assume sanitize would not clean these up for me? Correct?

Correct. Sanitize uses the bond information that's there.

-greg

--

Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record

2012-07-06 Thread JP
Steifl -- that is a really crafty and less painful way how to go about it.

The SMILES is in fact, a wget call away - since I am processing PDB
deposited structures.
http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG

Of course, this approach is not generally applicable (and for this you'd
need the perception code Greg mentioned).
Can I have a look at that code, just to see the amount of work involved --
if nothing else?

For the protein I can somehow guess the bonds.  SanitizeMol takes around
20 mins on my machine (I assume this is because of the massive size of the
molecule ~ 2000 atoms).

-
Jean-Paul Ebejer
Early Stage Researcher


On 6 July 2012 07:33, Stiefl, Nikolaus nikolaus.sti...@novartis.com wrote:

 Hi JP,
 Not sure if this is of any help. If it's an pdb file from rcsb or an
 in-house one where you have a corresponding smiles available maybe you
 could use this information to properly setup the bond types using bond
 matches? I know the components.cif file still has quite a few errors -
 however, maybe it could be of help.
 Cheers
 Nik


 Maybe you could use the smiles information ion

 On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote:

 Hi JP,
 
 On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote:
 
 
  I generate a RWMol instance from the HETATM portion of a PDB file.  My
 atoms
  are currently only joined by a single bond as defined in the connect
 portion
  of the pdb file, e.g.
 
  CONECT 2235 2234 2236
  CONECT 2236 2231 2235 2251
  CONECT 2237 2238 2242
 
 ah, yes, the missing bond orders, one of the many reasons that I have
 never done a PDB parser for the RDKit. :-S
 
 I think you're doing this work in C++, so I'm going to answer the rest
 of the questions accordingly.
 
  Are there any obvious rdkit ways how to detect :-
 
  0. rings
 
 Sure.
 
 If you just want to know if each atom/ring is in a ring you can use
 MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0
 or mol.getRingInfo().numBondRings(idx)0
 
 If you want to know what the SSSR rings are, then you should use
 MolOps::symmetrizeSSSR(). You can pass that an extra argument where it
 will return the rings as defined by atom indices. After calling this,
 you can also get the set of atom rings using
 mol.getRingInfo().atomRings() or the bond rings with
 mol.getRingInfo().bondRings();
 
  1. aromatic rings/atoms
  2. double/triple bonds
  3. charges (if any)
 
 Here's where the trouble starts.
 
 I guess you want to perceive the bond types and atom hybridizations
 from the geometry. From there you can get the charges. The RDKit does
 not currently have anything to do this. There was a discussion on the
 mailing list last year:
 http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85
 where Geoff Hutchinson very kindly offered to donate the OpenBabel
 bond perception code to the RDKit. He sent the code, but I've never
 had the time to port it from OpenBabel to RDKit. If you're
 interested in implementing this and were willing to do it in a way
 that could be integrated into the main RDKit, I can send you the
 donated code; it's about 300 lines of well-commented C++.
 
 
  I would like to set these properties on every atom instance contained
 in my
  RWMol - so I generate a correct molecule representation.
  I assume sanitize would not clean these up for me? Correct?
 
 Correct. Sanitize uses the bond information that's there.
 
 -greg
 
 --
 
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Announcement: 1st RDKit User Group Meeting scheduled

2012-07-06 Thread JP
I just noticed that this is a user meeting and not a dev one - so
perhaps such a topic is out of scope...

-
Jean-Paul Ebejer
Early Stage Researcher


On 4 July 2012 16:52, Greg Landrum greg.land...@gmail.com wrote:

 On Wed, Jul 4, 2012 at 3:38 PM, JP jeanpaul.ebe...@inhibox.com wrote:
  A suggestion, if I'm allowed.

 suggestions are always allowed. They may, of course, be ignored. ;-)

  I am trying to build a PDB file parser to return an rdkit mol
 (representing
  a protein) from a pdb file.
 
  I have battled and partially won the glue code between C++ and python.
  This
  was by far the most difficult bit extending rdkit.  But there have been
  casualties and I am not sure I 100% understand what is going on (how is
 my
  code automatically residing in the Chem package? why do you need this
 struct
  anyway?).  If I may suggest a hands on/tutorial session at the RDKit user
  meeting on this -- I think it will be a topic worth broaching.  If there
 is
  enough interest, a free slot in the programme and someone with the
  capabilities to explain of course.

 Capabilities to explain is, of course, not a problem. I'm also happy
 to do it, but the topic is pretty specialized, so I wonder how many
 others would be interested.

 -greg

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Announcement: 1st RDKit User Group Meeting scheduled

2012-07-06 Thread Greg Landrum
On Fri, Jul 6, 2012 at 11:06 AM, JP jeanpaul.ebe...@inhibox.com wrote:
 I just noticed that this is a user meeting and not a dev one - so
 perhaps such a topic is out of scope...


eh, that depends. C++ users of the code are also users. I'm more
concerned about having a session that only 2 people are interested in.
To accommodate that we would have to do breakout sessions, and we
hadn't planned to do such a thing.

I will see if I can come up with a sensible demo that shows how to
create a new python extension module using RDKit functionality. This
should at least make some stuff easier for others in the future.

-greg

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Announcement: 1st RDKit User Group Meeting scheduled

2012-07-06 Thread Adrian JasiƄski
I have the question connected with the topic,
Is there any temporary plan of the meeting or topics which will be
discussed during this event?
If yes where I can find it?
If no will it be announced and when?

I suggest to publish such a list of topics and to vote for most
interesting of them.

2012/7/6 Greg Landrum greg.land...@gmail.com:
 On Fri, Jul 6, 2012 at 11:06 AM, JP jeanpaul.ebe...@inhibox.com wrote:
 I just noticed that this is a user meeting and not a dev one - so
 perhaps such a topic is out of scope...


 eh, that depends. C++ users of the code are also users. I'm more
 concerned about having a session that only 2 people are interested in.
 To accommodate that we would have to do breakout sessions, and we
 hadn't planned to do such a thing.

 I will see if I can come up with a sensible demo that shows how to
 create a new python extension module using RDKit functionality. This
 should at least make some stuff easier for others in the future.

 -greg

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record

2012-07-06 Thread Adrian Schreyer
On Fri, Jul 6, 2012 at 10:27 AM, JP jeanpaul.ebe...@inhibox.com wrote:
 Steifl -- that is a really crafty and less painful way how to go about it.

 The SMILES is in fact, a wget call away - since I am processing PDB
 deposited structures.
 http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG

Do you want ideal ligand structures or those from the experimental PDB
file? The ideal ones you can just download from PDBeChem (SDF). The
Credo web service has a REST resource for downloading ligand
structures in different formats (including SDF) - hopefully I will
have a web server soon to make this publicly available.

 Of course, this approach is not generally applicable (and for this you'd
 need the perception code Greg mentioned).
 Can I have a look at that code, just to see the amount of work involved --
 if nothing else?

Just remember that those ligand structures are often far from perfect
- the most common problems are missing atoms and dodgy geometries that
affect bond order perception. Here are the two ligands (FRG) from PDB
entry 1M48 for example:

COC(=O)[C@H](Cc1ccc(cc1)C#Cc2c2)NC(=O)C[C@H]3CCCN(C3)C(=[NH2+])N
COC(=O)[C@H](Cc1ccc(cc1)C#Cc2c2)NC(=O)C[C@H]3CC=CN(C3)C(=[NH2+])N

The second one has a double bond because one of the rings does not
have the same geometry as it does in the other ligand.

 For the protein I can somehow guess the bonds.  SanitizeMol takes around
 20 mins on my machine (I assume this is because of the massive size of the
 molecule ~ 2000 atoms).
 Jean-Paul Ebejer
 Early Stage Researcher

It might be easier to use Open Babel to load the PDB structure and
extract just the binding site, which you could safe in a different
format and use afterwards in RDKit.


 On 6 July 2012 07:33, Stiefl, Nikolaus nikolaus.sti...@novartis.com wrote:

 Hi JP,
 Not sure if this is of any help. If it's an pdb file from rcsb or an
 in-house one where you have a corresponding smiles available maybe you
 could use this information to properly setup the bond types using bond
 matches? I know the components.cif file still has quite a few errors -
 however, maybe it could be of help.
 Cheers
 Nik


 Maybe you could use the smiles information ion

 On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote:

 Hi JP,
 
 On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote:
 
 
  I generate a RWMol instance from the HETATM portion of a PDB file.  My
 atoms
  are currently only joined by a single bond as defined in the connect
 portion
  of the pdb file, e.g.
 
  CONECT 2235 2234 2236
  CONECT 2236 2231 2235 2251
  CONECT 2237 2238 2242
 
 ah, yes, the missing bond orders, one of the many reasons that I have
 never done a PDB parser for the RDKit. :-S
 
 I think you're doing this work in C++, so I'm going to answer the rest
 of the questions accordingly.
 
  Are there any obvious rdkit ways how to detect :-
 
  0. rings
 
 Sure.
 
 If you just want to know if each atom/ring is in a ring you can use
 MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0
 or mol.getRingInfo().numBondRings(idx)0
 
 If you want to know what the SSSR rings are, then you should use
 MolOps::symmetrizeSSSR(). You can pass that an extra argument where it
 will return the rings as defined by atom indices. After calling this,
 you can also get the set of atom rings using
 mol.getRingInfo().atomRings() or the bond rings with
 mol.getRingInfo().bondRings();
 
  1. aromatic rings/atoms
  2. double/triple bonds
  3. charges (if any)
 
 Here's where the trouble starts.
 
 I guess you want to perceive the bond types and atom hybridizations
 from the geometry. From there you can get the charges. The RDKit does
 not currently have anything to do this. There was a discussion on the
 mailing list last year:
 http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85
 where Geoff Hutchinson very kindly offered to donate the OpenBabel
 bond perception code to the RDKit. He sent the code, but I've never
 had the time to port it from OpenBabel to RDKit. If you're
 interested in implementing this and were willing to do it in a way
 that could be integrated into the main RDKit, I can send you the
 donated code; it's about 300 lines of well-commented C++.
 
 
  I would like to set these properties on every atom instance contained
 in my
  RWMol - so I generate a correct molecule representation.
  I assume sanitize would not clean these up for me? Correct?
 
 Correct. Sanitize uses the bond information that's there.
 
 -greg
 

  --
 
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Rdkit-discuss mailing list
 

Re: [Rdkit-discuss] Announcement: 1st RDKit User Group Meeting scheduled

2012-07-06 Thread Paul Emsley
  On 06/07/12 10:06, JP wrote:
 I just noticed that this is a user meeting and not a dev one - so 
 perhaps such a topic is out of scope...

What's the difference?

A user uses python and a dev uses python, boost.python and c++?

Anyway, I too (AFAICS ATM) would be interested in your suggestion (FWIW).

Paul.


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Announcement: 1st RDKit User Group Meeting scheduled

2012-07-06 Thread JP
IMHO I think the main difference is that a user is a consumer of the API
(so just calling the available methods), while a developer is one who
builds new functionality and extends RDKit's API and internals.  That the
user in this case is also a software developer somewhat blurs this
distinction.

The python wrapper is an example of something which a developer of RDKit
would be interested in.  But for the end user of RDKit, be it C++ or
python, this layer is transparent and of no interest.

HAGD!

-
Jean-Paul Ebejer
Early Stage Researcher


On 6 July 2012 13:05, Paul Emsley paul.ems...@bioch.ox.ac.uk wrote:

   On 06/07/12 10:06, JP wrote:
  I just noticed that this is a user meeting and not a dev one - so
  perhaps such a topic is out of scope...

 What's the difference?

 A user uses python and a dev uses python, boost.python and c++?

 Anyway, I too (AFAICS ATM) would be interested in your suggestion (FWIW).

 Paul.



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record

2012-07-06 Thread Paul Emsley
On 06/07/12 10:27, JP wrote:
 Steifl -- that is a really crafty and less painful way how to go about 
 it.

 The SMILES is in fact, a wget call away - since I am processing PDB 
 deposited structures.
 http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG

It is not clear to me how the SMILES helps you.  You still have to map 
between rdkit atoms and PDB atom names, do you not? How about using the 
monomer library?

http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/Dictionary/dictionary.html


 Of course, this approach is not generally applicable (and for this 
 you'd need the perception code Greg mentioned).

Hmm... I am unconvinced that you want to be doing chemistry perception.



 For the protein I can somehow guess the bonds.

What do you mean by guess here?  Are you worried about histidine 
protonation?




--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss