Re: [Rdkit-discuss] H-bond Acceptor problem
On Mon, Nov 3, 2008 at 4:19 AM, Robert DeLisle rkdeli...@gmail.com wrote: I could go with the AcceptorsPlusFluorines() function option. Either way works for me. Another thought I had that would likely required more work would be to allow the import of custom definitions. If I could import a set of SMARTS definitions, I could easily customize any pharmacophoric element without having to modify code. This would at least partially avoid the problem of custom definitions breaking upon public distribution as long as the custom definitions were included along with any distribution of code. If you're looking for a general-purpose mechanism for counting numbers of SMARTS-defined features, there is one already present in $RDBASE/Python/Chem/Fragments.py that you may be able to use. That machinery reads a set of names, descriptions, and SMARTS-based feature definitions from a text file -- $RDBASE/Data/FragmentDescriptors.csv (it's a bad name, because that's a tab-separated file) by default -- and constructs the corresponding descriptor functions. If you're looking for pharmacophoric point definitions (instead of descriptors for use in QSAR and the like), then it's probably best to look at the chemical feature functionality that uses the FDEF file mentioned earlier on this thread. -greg
Re: [Rdkit-discuss] H-bond Acceptor problem
On Wed, Oct 29, 2008 at 4:36 PM, Robert DeLisle rkdeli...@gmail.com wrote: Another 2 pence. Nik is clearly hijacking my thoughts. I had the same thoughts on fluoro - include a flag that would allow/disallow counting fluorine at all, and reduce it to aromatic fluorine only. Ok, I can go along with this. I'm going to skip the flag to add fluorine; if it's useful the AcceptorsPlusFluorines (or something) descriptor can be added. I opted to consider that I would modify the definition myself, but on further consideration that might be problematic if any of my code (or someone else's) becomes available for public consumption. Differing definitions might create problems with performance or interpretation. Agreed. I've modified the definition of hydrogen bond acceptors to: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0])\ ]') This change was checked in as rev871. I am, of course, open to further discussion. :-) -greg
Re: [Rdkit-discuss] H-bond Acceptor problem
On Tue, Oct 28, 2008 at 5:38 PM, Robert DeLisle rkdeli...@gmail.com wrote: I agree with Nik an additional 2 pence. In fact, while reading Greg's original note, my thoughts were essentially identical to Nik's comments. Excellent. Here's an altered proposal based on Nik's comments. The definition of NumHAcceptors will be modified (modifications discussed below). I won't make any changes to the NOCount or NHOHCount descriptors or introduce new names for them. The new names would conceivably break existing code and wouldn't really contribute to clarity of future code, so the change doesn't seem worth making. For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $(F-a)]') There are two changes here relative to the current definition: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are connected to an aromatic atom. Comments? -greg
Re: [Rdkit-discuss] H-bond Acceptor problem
Hi, maybe I have to rephrase a little. With respect to fluoro - the best way to put it would be: ... if at all I would reduce it to aromatic fluoro only ... Hence, personally I would leave th fluoro out of the general acceptor definition. I know there is cases where you find them but the frequency is really not comparable to things like carbonyls or similar (i.e. the rest of your query). Maybe something like a useFluoro flag which is by default set to false? Hope that clarifies things a little. Nik Greg Landrum greg.land...@gmail.com 29.10.2008 06:20 To rdkit-discuss@lists.sourceforge.net cc Subject Re: [Rdkit-discuss] H-bond Acceptor problem On Tue, Oct 28, 2008 at 5:38 PM, Robert DeLisle rkdeli...@gmail.com wrote: I agree with Nik an additional 2 pence. In fact, while reading Greg's original note, my thoughts were essentially identical to Nik's comments. Excellent. Here's an altered proposal based on Nik's comments. The definition of NumHAcceptors will be modified (modifications discussed below). I won't make any changes to the NOCount or NHOHCount descriptors or introduce new names for them. The new names would conceivably break existing code and wouldn't really contribute to clarity of future code, so the change doesn't seem worth making. For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $(F-a)]') There are two changes here relative to the current definition: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are connected to an aromatic atom. Comments? -greg - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you.
Re: [Rdkit-discuss] H-bond Acceptor problem
I wanted to make one more post on this topic, ask a couple questions (at the bottom of the post), and give people a few days to comment before I regenerate the regression test data and commit a change for this bug. On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com wrote: If the intention is to follow Lipinski's definitions of Hbond acceptors, then it should be a simple N+O count (look back at the original paper and that is how he difined it for simplicity). For those who are coming to this late, this is the NOCount() descriptor, which is already present in the RDKit. However, if the descriptor is intended to match a more intuitive/realistic definition of HBA, then N-H shouldn't be a part of it. I don't think I agree with this. There are plenty of cases of nitrogens with attached Hs that act as H-bond acceptors (I did a CCD search yesterday to be sure), but that's a side topic. Back to the main topic: since these descriptors are all defined in a module named Lipinski, and since this all qualitative anyway, I'd propose the following change: The existing NumHDonors and NumHAcceptors (with fixes, discussed below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then deprecate NOCount and NHOHCount (they will generate warnings when used in the next release and then be completely removed in the release after that). For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $([F;!$(F-*-F)])]')d There are two changes here: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are not connected to another atom that has more than one F attached (to exclude CF3 and CF2). I realize these are not highly tuned, very detailed definitions like those in the fdef file discussed elsewhere on this thread, but are they acceptable for a qualitative descriptor? So, the two questions: 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? Thanks for any feedback, -greg
Re: [Rdkit-discuss] H-bond Acceptor problem
Hi Greg, maybe some comments on your suggestions. 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? Personally, I would guess that most people would not expect to receive an N/O count if they are asking for H-donors and acceptors. Hence, I would propably use a different naming convention that includes the Lipinski specification (e.g. LipNumHAcc or similar). That way people will not get confused by very high counts for those values. 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? As you say - they are very basic but to me they look reasonable. If you actually want to tune them at a low level than I would propably change the F definition to fluoro's attached to aromatic rings only ( I know there is a lot of papers out there that discuss this issue ) but that's only me and I would guess that over time people should fine-tune these definitions to their own like anyway. My 2 pence Nik Greg Landrum greg.land...@gmail.com 28.10.2008 06:55 To rdkit-discuss@lists.sourceforge.net cc Subject Re: [Rdkit-discuss] H-bond Acceptor problem I wanted to make one more post on this topic, ask a couple questions (at the bottom of the post), and give people a few days to comment before I regenerate the regression test data and commit a change for this bug. On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com wrote: If the intention is to follow Lipinski's definitions of Hbond acceptors, then it should be a simple N+O count (look back at the original paper and that is how he difined it for simplicity). For those who are coming to this late, this is the NOCount() descriptor, which is already present in the RDKit. However, if the descriptor is intended to match a more intuitive/realistic definition of HBA, then N-H shouldn't be a part of it. I don't think I agree with this. There are plenty of cases of nitrogens with attached Hs that act as H-bond acceptors (I did a CCD search yesterday to be sure), but that's a side topic. Back to the main topic: since these descriptors are all defined in a module named Lipinski, and since this all qualitative anyway, I'd propose the following change: The existing NumHDonors and NumHAcceptors (with fixes, discussed below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then deprecate NOCount and NHOHCount (they will generate warnings when used in the next release and then be completely removed in the release after that). For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $([F;!$(F-*-F)])]')d There are two changes here: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are not connected to another atom that has more than one F attached (to exclude CF3 and CF2). I realize these are not highly tuned, very detailed definitions like those in the fdef file discussed elsewhere on this thread, but are they acceptable for a qualitative descriptor? So, the two questions: 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? Thanks for any feedback, -greg - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received
Re: [Rdkit-discuss] H-bond Acceptor problem
I agree with Nik an additional 2 pence. In fact, while reading Greg's original note, my thoughts were essentially identical to Nik's comments. -Kirk On Tue, Oct 28, 2008 at 2:40 AM, nikolaus.sti...@novartis.com wrote: Hi Greg, maybe some comments on your suggestions. 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? Personally, I would guess that most people would not expect to receive an N/O count if they are asking for H-donors and acceptors. Hence, I would propably use a different naming convention that includes the Lipinski specification (e.g. LipNumHAcc or similar). That way people will not get confused by very high counts for those values. 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? As you say - they are very basic but to me they look reasonable. If you actually want to tune them at a low level than I would propably change the F definition to fluoro's attached to aromatic rings only ( I know there is a lot of papers out there that discuss this issue ) but that's only me and I would guess that over time people should fine-tune these definitions to their own like anyway. My 2 pence Nik *Greg Landrum greg.land...@gmail.com* 28.10.2008 06:55 To rdkit-discuss@lists.sourceforge.net cc Subject Re: [Rdkit-discuss] H-bond Acceptor problem I wanted to make one more post on this topic, ask a couple questions (at the bottom of the post), and give people a few days to comment before I regenerate the regression test data and commit a change for this bug. On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com wrote: If the intention is to follow Lipinski's definitions of Hbond acceptors, then it should be a simple N+O count (look back at the original paper and that is how he difined it for simplicity). For those who are coming to this late, this is the NOCount() descriptor, which is already present in the RDKit. However, if the descriptor is intended to match a more intuitive/realistic definition of HBA, then N-H shouldn't be a part of it. I don't think I agree with this. There are plenty of cases of nitrogens with attached Hs that act as H-bond acceptors (I did a CCD search yesterday to be sure), but that's a side topic. Back to the main topic: since these descriptors are all defined in a module named Lipinski, and since this all qualitative anyway, I'd propose the following change: The existing NumHDonors and NumHAcceptors (with fixes, discussed below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then deprecate NOCount and NHOHCount (they will generate warnings when used in the next release and then be completely removed in the release after that). For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $([F;!$(F-*-F)])]')d There are two changes here: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are not connected to another atom that has more than one F attached (to exclude CF3 and CF2). I realize these are not highly tuned, very detailed definitions like those in the fdef file discussed elsewhere on this thread, but are they acceptable for a qualitative descriptor? So, the two questions: 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the official Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? Thanks for any feedback, -greg - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law
Re: [Rdkit-discuss] H-bond Acceptor problem
[heh, worse than sending a message without an attachment is hitting send before the message is done and sending a message without text... sorry] On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle rkdeli...@gmail.com wrote: As you know, I've been working with descriptors in RDKit, and I think I've found a bug in the calculation of H-bond Acceptors. Attached is an example structure, N-methyl-1H-indole-6-carboxamide. When I calculate NumHAcceptors for this structure, I get 3. I've looked at numerous other strucures and it seems that nitrogens are always counted. I went into the code and found the definitions used for HAcceptors: Here's a simpler case showing the same behavior: [15] m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1') [16] Lipinski.NumHAcceptors(m2) Out[16]: 3 so that confirms the wrong count $([O,S;H1;v2]-[!$(*=[O,N,P,S])]) $([O,S;H0;v2]) $([O,S;-]) $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]) $([N;v3;H0]) $([n,o,s;+0]) F Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH groups are being counted as an acceptor due to matching $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded according to this same definition? [20] m2.GetSubstructMatches(Chem.MolFromSmarts('[$([Nv3;H1,H2]-[!$(*=[O,N,P,S])])]')) Out[20]: ((1,),) Only matches one nitrogen... the amide nitrogen. The aromatic N matches the second but last definition: [29] m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]')) Out[29]: ((6,),) The problem is that the first definition matches an N that is single bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not exclude Ns that are single bonded to an atom that is doubly bonded to O,N,P, or S. So your amide with a secondary N matches. The problem isn't the matcher, it's the definition. Is that clear? I agree that this is a bug in the definition and will fix it. Would you mind entering the bug at sf.net or should I do it? -greg
Re: [Rdkit-discuss] H-bond Acceptor problem
Good point, Hans. I see that within the available descriptors there are NHOHCount and NOCount, which I assume are equivalent to Lipinski's Donors and Acceptors. Also there are NumHAcceptors and NumHDonors which I would expect to differentiate themselves from the Linpinski versions in some way. -Kirk On Wed, Oct 15, 2008 at 1:19 PM, Hans Purkey hans.pur...@gmail.com wrote: If the intention is to follow Lipinski's definitions of Hbond acceptors, then it should be a simple N+O count (look back at the original paper and that is how he difined it for simplicity). However, if the descriptor is intended to match a more intuitive/realistic definition of HBA, then N-H shouldn't be a part of it. Hans On Oct 15, 2008, at 11:50 AM, Greg Landrum wrote: [heh, worse than sending a message without an attachment is hitting send before the message is done and sending a message without text... sorry] On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle rkdeli...@gmail.com wrote: As you know, I've been working with descriptors in RDKit, and I think I've found a bug in the calculation of H-bond Acceptors. Attached is an example structure, N-methyl-1H-indole-6-carboxamide. When I calculate NumHAcceptors for this structure, I get 3. I've looked at numerous other strucures and it seems that nitrogens are always counted. I went into the code and found the definitions used for HAcceptors: Here's a simpler case showing the same behavior: [15] m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1') [16] Lipinski.NumHAcceptors(m2) Out[16]: 3 so that confirms the wrong count $([O,S;H1;v2]-[!$(*=[O,N,P,S])]) $([O,S;H0;v2]) $([O,S;-]) $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]) $([N;v3;H0]) $([n,o,s;+0]) F Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH groups are being counted as an acceptor due to matching $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded according to this same definition? [20] m2.GetSubstructMatches(Chem.MolFromSmarts('[$([Nv3;H1,H2]-[!$(*=[O,N,P,S])])]')) Out[20]: ((1,),) Only matches one nitrogen... the amide nitrogen. The aromatic N matches the second but last definition: [29] m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]')) Out[29]: ((6,),) The problem is that the first definition matches an N that is single bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not exclude Ns that are single bonded to an atom that is doubly bonded to O,N,P, or S. So your amide with a secondary N matches. The problem isn't the matcher, it's the definition. Is that clear? I agree that this is a bug in the definition and will fix it. Would you mind entering the bug at sf.net or should I do it? -greg - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss