Re: [Rdkit-discuss] H-bond Acceptor problem

2008-11-03 Thread Greg Landrum
On Mon, Nov 3, 2008 at 4:19 AM, Robert DeLisle rkdeli...@gmail.com wrote:
 I could go with the AcceptorsPlusFluorines() function option.  Either way
 works for me.

 Another thought I had that would likely required more work would be to allow
 the import of custom definitions.  If I could import a set of SMARTS
 definitions, I could easily customize any pharmacophoric element without
 having to modify code.  This would at least partially avoid the problem of
 custom definitions breaking upon public distribution as long as the custom
 definitions were included along with any distribution of code.

If you're looking for a general-purpose mechanism for counting numbers
of SMARTS-defined features, there is one already present in
$RDBASE/Python/Chem/Fragments.py that you may be able to use. That
machinery reads a set of names, descriptions, and SMARTS-based feature
definitions from a text file -- $RDBASE/Data/FragmentDescriptors.csv
(it's a bad name, because that's a tab-separated file) by default --
and constructs the corresponding descriptor functions.

If you're looking for pharmacophoric point definitions (instead of
descriptors for use in QSAR and the like), then it's probably best to
look at the chemical feature functionality that uses the FDEF file
mentioned earlier on this thread.

-greg



Re: [Rdkit-discuss] H-bond Acceptor problem

2008-11-01 Thread Greg Landrum
On Wed, Oct 29, 2008 at 4:36 PM, Robert DeLisle rkdeli...@gmail.com wrote:
 Another 2 pence.  Nik is clearly hijacking my thoughts.

 I had the same thoughts on fluoro - include a flag that would allow/disallow
 counting fluorine at all, and reduce it to aromatic fluorine only.

Ok, I can go along with this. I'm going to skip the flag to add
fluorine; if it's useful the AcceptorsPlusFluorines (or something)
descriptor can be added.

 I opted
 to consider that I would modify the definition myself, but on further
 consideration that might be problematic if any of my code (or someone
 else's) becomes available for public consumption.  Differing definitions
 might create problems with performance or interpretation.

Agreed.

I've modified the definition of hydrogen bond acceptors to:
 HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
$([O,S;H0;v2]),$([O,S;-]),\
$([N;v3;!$(n-...@[o,N,P,S])]),\
$([nH0,o,s;+0])\
]')

This change was checked in as rev871.

I am, of course, open to further discussion. :-)

-greg



Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-29 Thread Greg Landrum
On Tue, Oct 28, 2008 at 5:38 PM, Robert DeLisle rkdeli...@gmail.com wrote:
 I agree with Nik an additional 2 pence.  In fact, while reading Greg's
 original note, my thoughts were essentially identical to Nik's comments.

Excellent. Here's an altered proposal based on Nik's comments.

The definition of NumHAcceptors will be modified (modifications
discussed below). I won't make any changes to the NOCount or NHOHCount
descriptors or introduce new names for them. The new names would
conceivably break existing code and wouldn't really contribute to
clarity of future code, so the change doesn't seem worth making.

For the purposes of fixing the more complex HAcceptor descriptor I
propose the following SMARTS:
HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
$([O,S;H0;v2]),$([O,S;-]),\
$([N;v3;!$(n-...@[o,N,P,S])]),\
$([nH0,o,s;+0]),\
$(F-a)]')

There are two changes here relative to the current definition: the
third line and the last one.
The third line includes nitrogens that have three neighbors and that
are not connected to another atom that has a non-ring double bond to
O, N, P, or S. The last line includes Fs that are connected to an
aromatic atom.

Comments?
-greg



Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-29 Thread nikolaus . stiefl
Hi,

maybe I have to rephrase a little.

With respect to fluoro - the best way to put it would be:

... if at all I would reduce it to aromatic fluoro only ...

Hence, personally I would leave th fluoro out of the general acceptor 
definition. I know there is cases where you find them but the frequency is 
really not comparable to things like carbonyls or similar (i.e. the rest 
of your query). Maybe something like a useFluoro flag which is by 
default set to false?

Hope that clarifies things a little.

Nik




Greg Landrum greg.land...@gmail.com 
29.10.2008 06:20

To
rdkit-discuss@lists.sourceforge.net
cc

Subject
Re: [Rdkit-discuss] H-bond Acceptor problem






On Tue, Oct 28, 2008 at 5:38 PM, Robert DeLisle rkdeli...@gmail.com 
wrote:
 I agree with Nik an additional 2 pence.  In fact, while reading Greg's
 original note, my thoughts were essentially identical to Nik's comments.

Excellent. Here's an altered proposal based on Nik's comments.

The definition of NumHAcceptors will be modified (modifications
discussed below). I won't make any changes to the NOCount or NHOHCount
descriptors or introduce new names for them. The new names would
conceivably break existing code and wouldn't really contribute to
clarity of future code, so the change doesn't seem worth making.

For the purposes of fixing the more complex HAcceptor descriptor I
propose the following SMARTS:
HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
$([O,S;H0;v2]),$([O,S;-]),\
$([N;v3;!$(n-...@[o,N,P,S])]),\
$([nH0,o,s;+0]),\
$(F-a)]')

There are two changes here relative to the current definition: the
third line and the last one.
The third line includes nitrogens that have three neighbors and that
are not connected to another atom that has a non-ring double bond to
O, N, P, or S. The last line includes Fs that are connected to an
aromatic atom.

Comments?
-greg

-
This SF.Net email is sponsored by the Moblin Your Move Developer's 
challenge
Build the coolest Linux based applications with Moblin SDK  win great 
prizes
Grand prize is a trip for two to an Open Source event anywhere in the 
world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-28 Thread Greg Landrum
I wanted to make one more post on this topic, ask a couple questions
(at the bottom of the post), and give people a few days to comment
before I regenerate the regression test data and commit a change for
this bug.

On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com wrote:
 If the intention is to follow Lipinski's definitions of Hbond acceptors,
 then  it should be a simple N+O count (look back at the original paper and
 that is how he difined it for simplicity).

For those who are coming to this late, this is the NOCount()
descriptor, which is already present in the RDKit.

 However, if the descriptor is intended to match a more intuitive/realistic
 definition of HBA, then N-H shouldn't be a part of it.

I don't think I agree with this. There are plenty of cases of
nitrogens with attached Hs that act as H-bond acceptors (I did a CCD
search yesterday to be sure), but that's a side topic.

Back to the main topic: since these descriptors are all defined in a
module named Lipinski, and since this all qualitative anyway, I'd
propose the following change:
The existing NumHDonors and NumHAcceptors (with fixes, discussed
below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount
and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then
deprecate NOCount and NHOHCount (they will generate warnings when used
in the next release and then be completely removed in the release
after that).

For the purposes of fixing the more complex HAcceptor descriptor I
propose the following SMARTS:

HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
$([O,S;H0;v2]),$([O,S;-]),\
$([N;v3;!$(n-...@[o,N,P,S])]),\
$([nH0,o,s;+0]),\
$([F;!$(F-*-F)])]')d

There are two changes here: the third line and the last one.
The third line includes nitrogens that have three neighbors and that
are not connected to another atom that has a non-ring double bond to
O, N, P, or S.
The last line includes Fs that are not connected to another atom that
has more than one F attached (to exclude CF3 and CF2).

I realize these are not highly tuned, very detailed definitions like
those in the fdef file discussed elsewhere on this thread, but are
they acceptable for a qualitative descriptor?

So, the two questions:
1) Should the renaming mentioned above (i.e. the NumHAcceptor and
NumHDonor descriptors start returning the official Lipinski values
and the existing functions are renamed to NumHAcceptorAlt and
NumHDonorAlt) be done?
2) Is the above SMARTS reasonable for the more detailed HAcceptor definition?

Thanks for any feedback,
-greg



Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-28 Thread nikolaus . stiefl
Hi Greg,

maybe some comments on your suggestions. 

 1) Should the renaming mentioned above (i.e. the NumHAcceptor and
 NumHDonor descriptors start returning the official Lipinski values
 and the existing functions are renamed to NumHAcceptorAlt and
 NumHDonorAlt) be done?

Personally, I would guess that most people would not expect to receive an 
N/O count if they are asking for H-donors and acceptors. Hence, I would 
propably use a different naming convention that includes the Lipinski 
specification (e.g. LipNumHAcc or similar). That way people will not get 
confused by very high counts for those values.

 2) Is the above SMARTS reasonable for the more detailed HAcceptor 
definition?

As you say - they are very basic but to me they look reasonable. If you 
actually want to tune them at a low level than I would propably change the 
F definition to fluoro's attached to aromatic rings only ( I know there is 
a lot of papers out there that discuss this issue ) but that's only me and 
I would guess that over time people should fine-tune these definitions to 
their own like anyway.

My 2 pence
Nik





Greg Landrum greg.land...@gmail.com 
28.10.2008 06:55

To
rdkit-discuss@lists.sourceforge.net
cc

Subject
Re: [Rdkit-discuss] H-bond Acceptor problem






I wanted to make one more post on this topic, ask a couple questions
(at the bottom of the post), and give people a few days to comment
before I regenerate the regression test data and commit a change for
this bug.

On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com 
wrote:
 If the intention is to follow Lipinski's definitions of Hbond acceptors,
 then  it should be a simple N+O count (look back at the original paper 
and
 that is how he difined it for simplicity).

For those who are coming to this late, this is the NOCount()
descriptor, which is already present in the RDKit.

 However, if the descriptor is intended to match a more 
intuitive/realistic
 definition of HBA, then N-H shouldn't be a part of it.

I don't think I agree with this. There are plenty of cases of
nitrogens with attached Hs that act as H-bond acceptors (I did a CCD
search yesterday to be sure), but that's a side topic.

Back to the main topic: since these descriptors are all defined in a
module named Lipinski, and since this all qualitative anyway, I'd
propose the following change:
The existing NumHDonors and NumHAcceptors (with fixes, discussed
below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount
and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then
deprecate NOCount and NHOHCount (they will generate warnings when used
in the next release and then be completely removed in the release
after that).

For the purposes of fixing the more complex HAcceptor descriptor I
propose the following SMARTS:

HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
$([O,S;H0;v2]),$([O,S;-]),\
$([N;v3;!$(n-...@[o,N,P,S])]),\
$([nH0,o,s;+0]),\
$([F;!$(F-*-F)])]')d

There are two changes here: the third line and the last one.
The third line includes nitrogens that have three neighbors and that
are not connected to another atom that has a non-ring double bond to
O, N, P, or S.
The last line includes Fs that are not connected to another atom that
has more than one F attached (to exclude CF3 and CF2).

I realize these are not highly tuned, very detailed definitions like
those in the fdef file discussed elsewhere on this thread, but are
they acceptable for a qualitative descriptor?

So, the two questions:
1) Should the renaming mentioned above (i.e. the NumHAcceptor and
NumHDonor descriptors start returning the official Lipinski values
and the existing functions are renamed to NumHAcceptorAlt and
NumHDonorAlt) be done?
2) Is the above SMARTS reasonable for the more detailed HAcceptor 
definition?

Thanks for any feedback,
-greg

-
This SF.Net email is sponsored by the Moblin Your Move Developer's 
challenge
Build the coolest Linux based applications with Moblin SDK  win great 
prizes
Grand prize is a trip for two to an Open Source event anywhere in the 
world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received

Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-28 Thread Robert DeLisle
I agree with Nik an additional 2 pence.  In fact, while reading Greg's
original note, my thoughts were essentially identical to Nik's comments.

-Kirk



On Tue, Oct 28, 2008 at 2:40 AM, nikolaus.sti...@novartis.com wrote:


 Hi Greg,

 maybe some comments on your suggestions.

  1) Should the renaming mentioned above (i.e. the NumHAcceptor and
  NumHDonor descriptors start returning the official Lipinski values
  and the existing functions are renamed to NumHAcceptorAlt and
  NumHDonorAlt) be done?

 Personally, I would guess that most people would not expect to receive an
 N/O count if they are asking for H-donors and acceptors. Hence, I would
 propably use a different naming convention that includes the Lipinski
 specification (e.g. LipNumHAcc or similar). That way people will not get
 confused by very high counts for those values.

  2) Is the above SMARTS reasonable for the more detailed HAcceptor
 definition?

 As you say - they are very basic but to me they look reasonable. If you
 actually want to tune them at a low level than I would propably change the F
 definition to fluoro's attached to aromatic rings only ( I know there is a
 lot of papers out there that discuss this issue ) but that's only me and I
 would guess that over time people should fine-tune these definitions to
 their own like anyway.

 My 2 pence
 Nik




  *Greg Landrum greg.land...@gmail.com*

 28.10.2008 06:55
   To
 rdkit-discuss@lists.sourceforge.net
  cc
   Subject
 Re: [Rdkit-discuss] H-bond Acceptor problem




 I wanted to make one more post on this topic, ask a couple questions
 (at the bottom of the post), and give people a few days to comment
 before I regenerate the regression test data and commit a change for
 this bug.

 On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey hans.pur...@gmail.com
 wrote:
  If the intention is to follow Lipinski's definitions of Hbond acceptors,
  then  it should be a simple N+O count (look back at the original paper
 and
  that is how he difined it for simplicity).

 For those who are coming to this late, this is the NOCount()
 descriptor, which is already present in the RDKit.

  However, if the descriptor is intended to match a more
 intuitive/realistic
  definition of HBA, then N-H shouldn't be a part of it.

 I don't think I agree with this. There are plenty of cases of
 nitrogens with attached Hs that act as H-bond acceptors (I did a CCD
 search yesterday to be sure), but that's a side topic.

 Back to the main topic: since these descriptors are all defined in a
 module named Lipinski, and since this all qualitative anyway, I'd
 propose the following change:
 The existing NumHDonors and NumHAcceptors (with fixes, discussed
 below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount
 and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then
 deprecate NOCount and NHOHCount (they will generate warnings when used
 in the next release and then be completely removed in the release
 after that).

 For the purposes of fixing the more complex HAcceptor descriptor I
 propose the following SMARTS:

 HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
 $([O,S;H0;v2]),$([O,S;-]),\
 $([N;v3;!$(n-...@[o,N,P,S])]),\
 $([nH0,o,s;+0]),\
 $([F;!$(F-*-F)])]')d

 There are two changes here: the third line and the last one.
 The third line includes nitrogens that have three neighbors and that
 are not connected to another atom that has a non-ring double bond to
 O, N, P, or S.
 The last line includes Fs that are not connected to another atom that
 has more than one F attached (to exclude CF3 and CF2).

 I realize these are not highly tuned, very detailed definitions like
 those in the fdef file discussed elsewhere on this thread, but are
 they acceptable for a qualitative descriptor?

 So, the two questions:
 1) Should the renaming mentioned above (i.e. the NumHAcceptor and
 NumHDonor descriptors start returning the official Lipinski values
 and the existing functions are renamed to NumHAcceptorAlt and
 NumHDonorAlt) be done?
 2) Is the above SMARTS reasonable for the more detailed HAcceptor
 definition?

 Thanks for any feedback,
 -greg

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's
 challenge
 Build the coolest Linux based applications with Moblin SDK  win great
 prizes
 Grand prize is a trip for two to an Open Source event anywhere in the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


 _

 CONFIDENTIALITY NOTICE

 The information contained in this e-mail message is intended only for the
 exclusive use of the individual or entity named above and may contain
 information that is privileged, confidential or exempt from disclosure under
 applicable law

Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-15 Thread Greg Landrum
[heh, worse than sending a message without an attachment is hitting
send before the message is done and sending a message without text...
sorry]

On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle rkdeli...@gmail.com wrote:

 As you know, I've been working with descriptors in RDKit, and I think I've
 found a bug in the calculation of H-bond Acceptors.  Attached is an example
 structure, N-methyl-1H-indole-6-carboxamide.  When I calculate NumHAcceptors
 for this structure, I get 3.  I've looked at numerous other strucures and it
 seems that nitrogens are always counted.  I went into the code and found the
 definitions used for HAcceptors:

Here's a simpler case showing the same behavior:
[15]  m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1')

[16]  Lipinski.NumHAcceptors(m2)
Out[16]: 3

so that confirms the wrong count


 $([O,S;H1;v2]-[!$(*=[O,N,P,S])])
 $([O,S;H0;v2])
 $([O,S;-])
 $([Nv3;H1,H2]-[!$(*=[O,N,P,S])])
 $([N;v3;H0])
 $([n,o,s;+0])
 F

 Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH
 groups are being counted as an acceptor due to matching
 $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded
 according to this same definition?

[20]  
m2.GetSubstructMatches(Chem.MolFromSmarts('[$([Nv3;H1,H2]-[!$(*=[O,N,P,S])])]'))
Out[20]: ((1,),)

Only matches one nitrogen... the amide nitrogen. The aromatic N
matches the second but last definition:
[29]  m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]'))
Out[29]: ((6,),)

The problem is that the first definition matches an N that is single
bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not
exclude Ns that are single bonded to an atom that is doubly bonded to
O,N,P, or S. So your amide with a secondary N matches. The problem
isn't the matcher, it's the definition.

Is that clear?

I agree that this is a bug in the definition and will fix it. Would
you mind entering the bug at sf.net or should I do it?

-greg



Re: [Rdkit-discuss] H-bond Acceptor problem

2008-10-15 Thread Robert DeLisle
Good point, Hans.

I see that within the available descriptors there are NHOHCount and NOCount,
which I assume are equivalent to Lipinski's Donors and Acceptors.  Also
there are NumHAcceptors and NumHDonors which I would expect to differentiate
themselves from the Linpinski versions in some way.

-Kirk




On Wed, Oct 15, 2008 at 1:19 PM, Hans Purkey hans.pur...@gmail.com wrote:

 If the intention is to follow Lipinski's definitions of Hbond acceptors,
 then  it should be a simple N+O count (look back at the original paper and
 that is how he difined it for simplicity).

 However, if the descriptor is intended to match a more intuitive/realistic
 definition of HBA, then N-H shouldn't be a part of it.

 Hans


 On Oct 15, 2008, at 11:50 AM, Greg Landrum wrote:

  [heh, worse than sending a message without an attachment is hitting
 send before the message is done and sending a message without text...
 sorry]

 On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle rkdeli...@gmail.com
 wrote:


 As you know, I've been working with descriptors in RDKit, and I think
 I've
 found a bug in the calculation of H-bond Acceptors.  Attached is an
 example
 structure, N-methyl-1H-indole-6-carboxamide.  When I calculate
 NumHAcceptors
 for this structure, I get 3.  I've looked at numerous other strucures and
 it
 seems that nitrogens are always counted.  I went into the code and found
 the
 definitions used for HAcceptors:


 Here's a simpler case showing the same behavior:
 [15]  m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1')

 [16]  Lipinski.NumHAcceptors(m2)
 Out[16]: 3

 so that confirms the wrong count


 $([O,S;H1;v2]-[!$(*=[O,N,P,S])])
 $([O,S;H0;v2])
 $([O,S;-])
 $([Nv3;H1,H2]-[!$(*=[O,N,P,S])])
 $([N;v3;H0])
 $([n,o,s;+0])
 F

 Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH
 groups are being counted as an acceptor due to matching
 $([Nv3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded
 according to this same definition?


 [20] 
 m2.GetSubstructMatches(Chem.MolFromSmarts('[$([Nv3;H1,H2]-[!$(*=[O,N,P,S])])]'))
 Out[20]: ((1,),)

 Only matches one nitrogen... the amide nitrogen. The aromatic N
 matches the second but last definition:
 [29]  m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]'))
 Out[29]: ((6,),)

 The problem is that the first definition matches an N that is single
 bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not
 exclude Ns that are single bonded to an atom that is doubly bonded to
 O,N,P, or S. So your amide with a secondary N matches. The problem
 isn't the matcher, it's the definition.

 Is that clear?

 I agree that this is a bug in the definition and will fix it. Would
 you mind entering the bug at sf.net or should I do it?

 -greg

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's
 challenge
 Build the coolest Linux based applications with Moblin SDK  win great
 prizes
 Grand prize is a trip for two to an Open Source event anywhere in the
 world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss