Re: [Rdkit-discuss] Molecule reading issues

2014-06-01 Thread Toby Wright
If you just want to ignore the error add a try...catch block around the
offending line.

Yours,

Toby Wright


On 31 May 2014 00:03, Matthew Lardy mla...@gmail.com wrote:

 Hi all,

 I am having this issue with the Java wrapper while trying to create a
 smiles string from a RWMol class object.  I don't care about trying to
 figure out what is going wrong, I just want to bypass this record without
 my application closing.  Any ideas?

 Here is the offending line:

 rdmol.MolToSmiles();

 The error:

 Exception in thread main org.RDKit.MolSanitizeException
 at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method)
 at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422)

 Thanks in advance!
 Matt


 --
 Time is money. Stop wasting it! Get your web API in 5 minutes.
 www.restlet.com/download
 http://p.sf.net/sfu/restlet
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MMFF94 atom typing OHs connected to aromatic heterocycles.

2014-04-16 Thread Toby Wright
Thanks very much Paulo.

Yours,

Toby

--
InhibOx Ltd


On 15 April 2014 00:25, Paolo Tosco paolo.to...@unito.it wrote:

  Dear Toby,

 I checked the MMFF literature and indeed that hydrogen must be type 29; I
 just submitted a pull request with the bug fix. Apparently the test case
 that you presented is not covered by the validation suite and so I missed
 that bug until today: thank you very much for reporting it!

 Cheers,
 p.


 On 04/14/2014 03:42 PM, Toby Wright wrote:

  Hi,

 I've been using the MMFF94 forcefield and noticed an odd behaviour is a
 couple of molecules.
  phenolish1 = Chem.MolFromSmiles('Oc1ncccn1')
  phenolish2 = Chem.MolFromSmiles('Oc1ncncc1')
  prop1 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish1))
  prop2 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish2))
  print prop1.GetMMFFAtomType(7) #Atom 7 is the H of the OH
 21
  print prop2.GetMMFFAtomType(7) #Atom 7 is the H of the OH
 29

  Type 29 is a hydrogen attached to an oxygen in enols, phenols or HO-C=N
 which is only sort of the case here (but perhaps pragmatically we should
 consider phenol the closest option?)
 Digging into the code in GraphMol/ForceFieldHelpers/MMFF/AtomTyper.cpp we
 see (between lines 2092 and 2133) that we consider an atom to hit the
 phenol case if we have the oxygen attached to a carbon attached via an
 aromatic bond to another carbon. We have this in phenolish2 but not in
 phenolish1 hence the different outputs. If we change the test on line 2115
 to:

 if ((bond-getBondType() == Bond::AROMATIC) || ((nbr3Atom-getAtomicNum()
 == 6)  (bond-getBondType() == Bond::DOUBLE))) {

  then both cases above show the same behaviour, considering phenolish
 things to be phenols for the sake of MMFF94 atom typing. Alternatively we
 could consider phenolish things to be not phenols and implement atom type
 21 for the hydrogen in both cases. Any thoughts?

  Yours,

  Toby Wright

  PS I'm aware that the tautomers above aren't ideal, these are fragments
 snipped from more complex molecules to demonstrate the behaviour.

  --
  InhibOx Ltd


 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and their
 applications. Written by three acclaimed leaders in the field,
 this first edition is now available. Download your free book 
 today!http://p.sf.net/sfu/NeoTech



 ___
 Rdkit-discuss mailing 
 listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss



 --
 ==
 Paolo Tosco, Ph.D.
 Department of Drug Science and Technology
 Via Pietro Giuria, 9 - 10125 Torino (Italy)
 Tel: +39 011 670 7680 | Mob: +39 348 5537206
 Fax: +39 011 670 7687 | E-mail: paolo.tosco@unito.ithttp://open3dqsar.org | 
 http://open3dalign.org
 ==



 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and their
 applications. Written by three acclaimed leaders in the field,
 this first edition is now available. Download your free book today!
 http://p.sf.net/sfu/NeoTech
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MMFF94 atom typing OHs connected to aromatic heterocycles.

2014-04-14 Thread Toby Wright
Hi,

I've been using the MMFF94 forcefield and noticed an odd behaviour is a
couple of molecules.
 phenolish1 = Chem.MolFromSmiles('Oc1ncccn1')
 phenolish2 = Chem.MolFromSmiles('Oc1ncncc1')
 prop1 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish1))
 prop2 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish2))
 print prop1.GetMMFFAtomType(7) #Atom 7 is the H of the OH
21
 print prop2.GetMMFFAtomType(7) #Atom 7 is the H of the OH
29

Type 29 is a hydrogen attached to an oxygen in enols, phenols or HO-C=N
which is only sort of the case here (but perhaps pragmatically we should
consider phenol the closest option?)
Digging into the code in GraphMol/ForceFieldHelpers/MMFF/AtomTyper.cpp we
see (between lines 2092 and 2133) that we consider an atom to hit the
phenol case if we have the oxygen attached to a carbon attached via an
aromatic bond to another carbon. We have this in phenolish2 but not in
phenolish1 hence the different outputs. If we change the test on line 2115
to:

if ((bond-getBondType() == Bond::AROMATIC) || ((nbr3Atom-getAtomicNum()
== 6)  (bond-getBondType() == Bond::DOUBLE))) {

then both cases above show the same behaviour, considering phenolish things
to be phenols for the sake of MMFF94 atom typing. Alternatively we could
consider phenolish things to be not phenols and implement atom type 21 for
the hydrogen in both cases. Any thoughts?

Yours,

Toby Wright

PS I'm aware that the tautomers above aren't ideal, these are fragments
snipped from more complex molecules to demonstrate the behaviour.

--
InhibOx Ltd
--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [H] vs [2H] in reactions

2014-04-09 Thread Toby Wright
Hi,

Don't worry, in my real cases I'm using [OH:1][C:2] (or similar depending
on the reaction in question). I guess I made my simplest possible example
too simple to be a useful communication. Adding the ;D1 will also protect
against molecules specified as [1H]OC but hopefully my third party input
data doesn't have anything so odd. My intuition was that [OH]C [H]OC and
[2H]OC were the same in terms of things like the degree of the oxygen but
playing around with daylight's tools tells me I was wrong.

I guess then the original question has been answered by the pragmatic
principle that states: hydrogens are not atoms iff they are of unspecified
isotope. In the deuterium case we have a hydrogen atom attached to the
oxygen that is not mentioned in the reaction and so, like any atom, is
carried across with the mapped atoms. In the [OH] case we have an oxygen
with a property and that property need not be conserved by reaction
transforms, and so isn't. And in the [H]O case it is internally converted
to an [OH] before the reaction takes place.

Thanks once again for your time,

Toby Wright

--
InhibOx Ltd


On 8 April 2014 02:35, Greg Landrum greg.land...@gmail.com wrote:

 Hi Toby,

 On Mon, Apr 7, 2014 at 3:37 PM, Toby Wright toby.wri...@inhibox.comwrote:


 Noticed something odd but I'm not confident enough with reaction SMARTS
 to say it's a bug. I'm reacting with an OH, leading to the oxygen losing
 the hydrogen and gaining a second bond. For example:

  rxn = AllChem.ReactionFromSmarts([O:1]C[O:1])
  mol = Chem.MolFromSmiles(OC)
  p = rxn.RunReactants((mol,))[0][0]
  Chem.SanitizeMol(p)
 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
  Chem.MolToSmiles(p)
 'COC'

 So far so good. Now to add explicit hydrogens to the oxygen. If mol is
 [OH]C or [H]OC the same behaviour as above happens. However if it is
 [2H]OC we run into:

 ValueError: Sanitization error: Explicit valence for atom # 1 O, 3, is
 greater than permitted


 because the deuterium is being preserved whereas in the other cases the
 hydrogen is discarded. I can't find anything in the SMARTS documentation to
 suggest that this is the correct behaviour so I'm going to suggest that if
 the [H] was being discarded by the reaction then so should the [2H].


 You've diagnosed what is happening correctly: the RemoveHs() functionality
 does not remove the [2H] since that's not something that can be replaced by
 inspecting the valence of the O atom.

 That's not the real problem here though. The reaction above also won't
 work for ethers or anything with a double bond to an O. What you more
 likely want is something like:
 rxn = AllChem.ReactionFromSmarts([OH;D1:1]C[O:1])
 this will match CO, but not C=O, COC, or CO[2H].
 If you want the reaction to also apply to the deuterated species, which
 you say later in your email you don't, I think you're going to have to
 AddHs to the molecules before calling RunReactants() and explicitly include
 the H in the reaction query. Or, of course, you could add as second
 reaction to deal with Hs that are actually present.

 -greg



 In either case it's not a problem for me as I have no particular interest
 in Deuterium containing molecules so I don't need a workaround or quick
 fix. I just happened across the behaviour and thought it worth reporting.

 Yours,

 Toby Wright

 --
 InhibOx Ltd


 --
 Put Bad Developers to Shame
 Dominate Development with Jenkins Continuous Integration
 Continuously Automate Build, Test  Deployment
 Start a new project now. Try Jenkins in the cloud.
 http://p.sf.net/sfu/13600_Cloudbees_APR
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test  Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] [H] vs [2H] in reactions

2014-04-07 Thread Toby Wright
Hi,

Noticed something odd but I'm not confident enough with reaction SMARTS to
say it's a bug. I'm reacting with an OH, leading to the oxygen losing the
hydrogen and gaining a second bond. For example:

 rxn = AllChem.ReactionFromSmarts([O:1]C[O:1])
 mol = Chem.MolFromSmiles(OC)
 p = rxn.RunReactants((mol,))[0][0]
 Chem.SanitizeMol(p)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
 Chem.MolToSmiles(p)
'COC'

So far so good. Now to add explicit hydrogens to the oxygen. If mol is
[OH]C or [H]OC the same behaviour as above happens. However if it is
[2H]OC we run into:

ValueError: Sanitization error: Explicit valence for atom # 1 O, 3, is
greater than permitted

because the deuterium is being preserved whereas in the other cases the
hydrogen is discarded. I can't find anything in the SMARTS documentation to
suggest that this is the correct behaviour so I'm going to suggest that if
the [H] was being discarded by the reaction then so should the [2H].

In either case it's not a problem for me as I have no particular interest
in Deuterium containing molecules so I don't need a workaround or quick
fix. I just happened across the behaviour and thought it worth reporting.

Yours,

Toby Wright

--
InhibOx Ltd
--
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test  Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees_APR___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Further issue with reactions and chirality

2014-03-31 Thread Toby Wright
Further investigation shows that this issue is not related to the reaction
code at all, it's a general SMILES canonicalisation bug I'm afraid.
Consider the following:

 mol = Chem.MolFromSmiles(C1C[C@@H](CC)CC[C@@H](CC)1)
 print Chem.MolToSmiles(mol, isomericSmiles=True)
CC[C@@H]1CC[C@@H](CC)CC1

The output should be the same as the input but plugging those strings into
the daylight website's depiction tool gives chirally different molecules.
This behaviour is observed in RDKit 2013.09 with no custom patches.

Yours,

Toby Wright

--
InhibOx Ltd



On 28 March 2014 15:47, Toby Wright toby.wri...@inhibox.com wrote:

 Oops, forgot to mention: This is with the solution to github issue 
 #233https://github.com/rdkit/rdkit/issues/233patched into my RDKit build.

 Yours,

 Toby Wright

 --
 InhibOx Ltd


 On 28 March 2014 15:43, Toby Wright toby.wri...@inhibox.com wrote:

 Hi,

 I believe I've found a bug in the new code that deals with reactions that
 have chirality specified for untagged product atoms. Consider the following:

  rxn = AllChem.ReactionFromSmarts([C:1].[C:2]C1C[C@@H](C[C:1])CC[C@
 @H](C[C:2])1)
  m1 = Chem.MolFromSmiles('FC')
  m2 = Chem.MolFromSmiles('BrC')
  p = rxn.RunReactants((m1,m2))[0][0]
  Chem.SanitizeMol(p)
 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
  Chem.MolToSmiles(p, isomericSmiles=True)
 'FCC[C@@H]1CC[C@@H](CCBr)CC1'

 The output looks about right, the [C@@H]s are both still [C@@H] but
 whereas before they were both being approached from around the ring now the
 canonicalisation has us approaching one from outside the ring. Both
 extensions from the ring should be towards and if I convert the product
 part of the above reaction to a png I get:
 [image: Inline images 1]
 but in the output one is towards and the other is away:
 [image: Inline images 2]

 Note that I can work around this, if I specify my reaction as
 [C:1].[C:2][C:1]C[C@H]1CC[C@@H](C[C:2])CC1 thus apeing the atom
 ordering of the product RDKit will give me I get the chirality I want, at
 least in my test cases so far.

 Yours,

 Toby Wright

 --
 InhibOx Ltd



inline: MadeProduct.pnginline: ReactionProduct.png--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Further issue with reactions and chirality

2014-03-28 Thread Toby Wright
Oops, forgot to mention: This is with the solution to github issue
#233https://github.com/rdkit/rdkit/issues/233patched into my RDKit
build.

Yours,

Toby Wright

--
InhibOx Ltd


On 28 March 2014 15:43, Toby Wright toby.wri...@inhibox.com wrote:

 Hi,

 I believe I've found a bug in the new code that deals with reactions that
 have chirality specified for untagged product atoms. Consider the following:

  rxn = AllChem.ReactionFromSmarts([C:1].[C:2]C1C[C@@H](C[C:1])CC[C@
 @H](C[C:2])1)
  m1 = Chem.MolFromSmiles('FC')
  m2 = Chem.MolFromSmiles('BrC')
  p = rxn.RunReactants((m1,m2))[0][0]
  Chem.SanitizeMol(p)
 rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
  Chem.MolToSmiles(p, isomericSmiles=True)
 'FCC[C@@H]1CC[C@@H](CCBr)CC1'

 The output looks about right, the [C@@H]s are both still [C@@H] but
 whereas before they were both being approached from around the ring now the
 canonicalisation has us approaching one from outside the ring. Both
 extensions from the ring should be towards and if I convert the product
 part of the above reaction to a png I get:
 [image: Inline images 1]
 but in the output one is towards and the other is away:
 [image: Inline images 2]

 Note that I can work around this, if I specify my reaction as
 [C:1].[C:2][C:1]C[C@H]1CC[C@@H](C[C:2])CC1 thus apeing the atom
 ordering of the product RDKit will give me I get the chirality I want, at
 least in my test cases so far.

 Yours,

 Toby Wright

 --
 InhibOx Ltd

inline: ReactionProduct.pnginline: MadeProduct.png--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Further issue with reactions and chirality

2014-03-28 Thread Toby Wright
Hi,

I believe I've found a bug in the new code that deals with reactions that
have chirality specified for untagged product atoms. Consider the following:

 rxn = AllChem.ReactionFromSmarts([C:1].[C:2]C1C[C@@H](C[C:1])CC[C@
@H](C[C:2])1)
 m1 = Chem.MolFromSmiles('FC')
 m2 = Chem.MolFromSmiles('BrC')
 p = rxn.RunReactants((m1,m2))[0][0]
 Chem.SanitizeMol(p)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
 Chem.MolToSmiles(p, isomericSmiles=True)
'FCC[C@@H]1CC[C@@H](CCBr)CC1'

The output looks about right, the [C@@H]s are both still [C@@H] but whereas
before they were both being approached from around the ring now the
canonicalisation has us approaching one from outside the ring. Both
extensions from the ring should be towards and if I convert the product
part of the above reaction to a png I get:
[image: Inline images 1]
but in the output one is towards and the other is away:
[image: Inline images 2]

Note that I can work around this, if I specify my reaction as
[C:1].[C:2][C:1]C[C@H]1CC[C@@H](C[C:2])CC1 thus apeing the atom ordering
of the product RDKit will give me I get the chirality I want, at least in
my test cases so far.

Yours,

Toby Wright

--
InhibOx Ltd
inline: ReactionProduct.pnginline: MadeProduct.png--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Reactions and chirality: Untagged chiral product atoms

2014-03-14 Thread Toby Wright
Hi,

Looking over the documentation and discussion threads I've found solid and
sensible answers for how chiral molecules and reactions in almost every
case, but I've hit what seems to be an issue in a situation I can't find
discussed.
I have atoms in my product that are untagged and do not appear in my
reactants. This is because I'm shortcutting a number of steps in what is
happening in the real chemistry where these extra atoms are added. And
RDKit behaves exactly as I would hope in general when I do this, adding
these new atoms to the product without taking them from any reactant. But
where these new atoms have chiral information it is being lost, as shown by
the following example:

 rxn =
AllChem.ReactionFromSmarts([F:1][C:2]([C:3])[I:4][F:1][C:2]([C:3][C@H
]([OH])Br)[Cl:4])
 m = Chem.MolFromSmiles('FC(C)I')
 p = rxn.RunReactants((m,))[0][0]
 Chem.SanitizeMol(p)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
 Chem.MolToSmiles(p,isomericSmiles=True)
'OC(Br)CC(F)Cl'

The output I was hoping for was O[C@@H](CC(F)Cl)Br but the chiral
information in the unmapped atoms in the product part of the reaction
specification appears to have been lost. Is this a bug or am I going about
my reaction in the wrong way?

Yours,

Toby Wright

--
InhibOx Ltd
--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching

2014-03-07 Thread Toby Wright
Thanks Greg,

The final strange behaviour I've noticed that could trip fellow users up is
with matching kekule verses aromatic representations of the same molecule
in SMARTS against SMILES. Most surprisingly C1=CC=CC=C1 is not a
substructure of itself but has c1c1 as a substructure (if the lefthand
term is SMILES and the right is SMARTS in both cases).
Code to demonstrate what I mean below:

 aromatic_benzene_smiles = Chem.MolFromSmiles('c1c1')
 aromatic_benzene_smarts = Chem.MolFromSmarts('c1c1')
 kekule_benzene_smiles = Chem.MolFromSmiles('C1=CC=CC=C1')
 kekule_benzene_smarts = Chem.MolFromSmarts('C1=CC=CC=C1')
 aromatic_benzene_smiles.HasSubstructMatch(aromatic_benzene_smarts)
True
 aromatic_benzene_smiles.HasSubstructMatch(kekule_benzene_smiles)
True
 aromatic_benzene_smiles.HasSubstructMatch(kekule_benzene_smarts)
False
 kekule_benzene_smiles.HasSubstructMatch(kekule_benzene_smarts)
False
 kekule_benzene_smiles.HasSubstructMatch(aromatic_benzene_smiles)
True
 kekule_benzene_smiles.HasSubstructMatch(aromatic_benzene_smarts)
True

I think I can see why there is a difference in behaviour, a double bond is
not the same thing as an aromatic bond. In the SMILES case a conversion can
take place because the context is complete but in the SMARTS case it is not
(or at least might not be). But I thought I'd point out the issue in any
case. The workaround is to always explicitly make atoms aromatic in SMARTS
if you wish them to match aromatic SMILES rather than relying on the kekule
representation to sort it for you.

Yours,

Toby Wright

--
InhibOx Ltd


On 6 March 2014 04:55, Greg Landrum greg.land...@gmail.com wrote:



 On Wed, Mar 5, 2014 at 4:03 PM, Toby Wright toby.wri...@inhibox.comwrote:


 This is probably related to the above so I thought I'd post it on this
 thread. I am noticing inconsistent behaviour when a molecule created via
 SMARTS that contains an 'or' statement has HasSubstructMatch called on it,
 as opposed to it being the argument to HasSubstructMatch. A simple example
 follows:

  O_or_C = Chem.MolFromSmarts('[O,C]')
  O = Chem.MolFromSmiles('O')
  C = Chem.MolFromSmiles('C')
  O_or_C.HasSubstructMatch(O)
 True
  O_or_C.HasSubstructMatch(C)
 False
  O.HasSubstructMatch(O_or_C)
 True
  C.HasSubstructMatch(O_or_C)
 True

 We also see:
  C_or_O = Chem.MolFromSmarts('[C,O]')
  C_or_O.HasSubstructMatch(O)
 False
  C_or_O.HasSubstructMatch(C)
 True

 so the order of elements in a SMARTS 'or' statement changes the
 behaviour, which is unexpected.


 This is indeed related. This is a case I didn't cover above: the
 SMILES/SMARTS match. The behavior above is expected from the point of view
 of what's in the code, though I can understand how it may not make much
 sense from the perspective of someone using the code. :-) The above should
 probably return False in both cases.

 In general, one should probably expect that using the HasSubstructMatch()
 method of a molecule constructed from SMARTS is likely to produce strange
 results. Getting a general purpose query--query matcher to work is, as far
 as I can tell, a decidedly non-trivial problem.

 -greg


--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two nitrogens in a 5 membered ring

2014-03-05 Thread Toby Wright
Thanks all for informative and helpful responses, the behaviour I was
struggling to understand now makes perfect sense.

Toby Wright

--
InhibOx Ltd


On 4 March 2014 04:06, Greg Landrum greg.land...@gmail.com wrote:

 Bob hit the nail on the head.

 The first case, N1N=CC=C1, is aromatic because the RDKit sees that the
 first nitrogen has two bonds to it, assigns a hydrogen, and then sees a
 conjugated pi system with 6 electrons that is flagged as aromatic.
 Something similar would happen with the aromatic form [nH]1nccc1: first the
 ring system is kekulized to yield N1N=CC=C1, then the sanitization proceeds
 from there. The same thing would happen with the equivalent n1[nH]ccc1.

 The second case, N1=NC=CC1, has a C (the last one) that only has single
 bonds to it. This is assigned sp3 hybridization, so there's no conjugated
 ring system for aromaticity to be perceived in.

 The final case, n1nccc1, is an instance of the pyrrole problem: aromatic
 N's that need an implicit H on them, should have that implicit H present in
 the aromatic SMILES.

  -greg




 On Mon, Mar 3, 2014 at 5:59 PM, Bob Funchess bfunch...@kelaroo.comwrote:

 Hi Toby,



 I'd say it's more of a limitation inherent in Kekule representations than
 an actual bug in RDKit.  Trying to get too clever in figuring out what
 the user meant usually causes more harm than good.



 I'm not sure what version of RDKit you're using, but the aromatic
 specification with an explicit hydrogen on one of the nitrogen atoms works
 for me:



  Chem.MolFromSmiles('n1[nH]ccc1').Debug();

 Atoms:

 0 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0

 1 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0

 2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0

 3 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0

 4 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0

 Bonds:

 0 0-1 order: 12 conj?: 1 aromatic?: 1

 1 1-2 order: 12 conj?: 1 aromatic?: 1

 2 2-3 order: 12 conj?: 1 aromatic?: 1

 3 3-4 order: 12 conj?: 1 aromatic?: 1

 4 4-0 order: 12 conj?: 1 aromatic?: 1



 The double bonds in the Kekule representations here can be between atom
 pairs 1,2 and 3,4 or between atom pairs 2,3 and 4,0.  Putting one between
 pair 0,1 leaves atom 4 with two single bonds to it (and therefore, to
 satisfy valence requirements, two implicit hydrogens); I'm not horribly
 surprised that RDKit perceives that as aliphatic.  You can see that's
 what's happening in your second example where the hybridization of atom 4
 is 4 (sp3) instead of 3 (sp2).



 Regards,

 Bob



 --

 Bob Funchess, Ph.D.
 Kelaroo, Inc

 Senior Scientist
 www.kelaroo.com

 bfunch...@kelaroo.com (858)
 259-7561 x3



 --
 Subversion Kills Productivity. Get off Subversion  Make the Move to
 Perforce.
 With Perforce, you get hassle-free workflows. Merge that actually works.
 Faster operations. Version large binaries.  Built-in WAN optimization and
 the
 freedom to use Git, Perforce or both. Make the move to Perforce.

 http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching

2014-03-05 Thread Toby Wright
Hi,

This is probably related to the above so I thought I'd post it on this
thread. I am noticing inconsistent behaviour when a molecule created via
SMARTS that contains an 'or' statement has HasSubstructMatch called on it,
as opposed to it being the argument to HasSubstructMatch. A simple example
follows:

 O_or_C = Chem.MolFromSmarts('[O,C]')
 O = Chem.MolFromSmiles('O')
 C = Chem.MolFromSmiles('C')
 O_or_C.HasSubstructMatch(O)
True
 O_or_C.HasSubstructMatch(C)
False
 O.HasSubstructMatch(O_or_C)
True
 C.HasSubstructMatch(O_or_C)
True

We also see:
 C_or_O = Chem.MolFromSmarts('[C,O]')
 C_or_O.HasSubstructMatch(O)
False
 C_or_O.HasSubstructMatch(C)
True

so the order of elements in a SMARTS 'or' statement changes the behaviour,
which is unexpected.

Yours,

Toby Wright

--
InhibOx Ltd


On 5 March 2014 10:10, Christos Kannas chriskan...@gmail.com wrote:

 Hi Greg,

 Thanks a lot for the explanation.
 It makes things clearer now.
 Well the reason I'm doing SMARTS-SMARTS match is because I would like to
 match functional groups with the reactants in reactions.

 Regards,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 On 5 March 2014 04:44, Greg Landrum greg.land...@gmail.com wrote:

 Hi Christos,


 On Tue, Mar 4, 2014 at 3:46 PM, Christos Kannas chriskan...@gmail.comwrote:

 Hi all,

 Why does the following happen?

 In [1]: from rdkit import Chem
 In [2]: from rdkit.Chem import AllChem
 In [3]: from rdkit.Chem import Draw

 In [4]: patt = Chem.MolFromSmarts([CH;D2;!$(C-[!#6;!#1])]=O)

 In [5]: z2 = Chem.MolFromSmarts([*]-C-C([H])(=O), 1)
 In [6]: print Chem.MolToSmiles(z2)
 [*]CC=O
 In [7]: print Chem.MolToSmarts(z2)
 *-C-[C!H0]=O
 In [9]: z2.HasSubstructMatch(patt)
 Out[9]: False

 In [10]: z3 = Chem.MolFromSmiles(Chem.MolToSmiles(z2))
 In [11]: print Chem.MolToSmiles(z3)
 [*]CC=O
 In [12]: print Chem.MolToSmarts(z3)
 [*]-[#6]-[#6]=[#8]
 In [13]: z3.HasSubstructMatch(patt)
 Out[13]: True

 Shouldn't be that z2 and z3 have the same information?


 The way SMARTS/SMARTS matches is handled is different than the way
 SMARTS/SMILES matches works.
  The short answer is that when doing a SMARTS/SMARTS match, the RDKit
 compares the queries to each other; when doing a SMARTS/SMILES match, on
 the other hand, it checks to see if the atoms in the SMILES molecule match
 the queries in the SMARTS molecule.

 A bit longer answer:
 Molecules built using MolFromSmiles contain Atoms, molecules built using
 MolFromSmarts contain QueryAtoms. Both atoms and QueryAtoms have a Match()
 method that takes another Atom or QueryAtom as an argument and returns
 whether or not the two match.
 The substructure matching code makes heavy use of this Match() method.
 QueryAtom.Match(Atom) checks to see if the Atom satisfies the query.
 QueryAtom.Match(QueryAtom) checks to see if the queries on the atoms are
 the same. This uses a crude approach that is easy to fool, but I assume
 that a SMARTS-SMARTS match is not a frequent thing someone wants to do.
 query-query matching is also not a particularly easy problem to solve in a
 general way.

 -greg






 --
 Subversion Kills Productivity. Get off Subversion  Make the Move to
 Perforce.
 With Perforce, you get hassle-free workflows. Merge that actually works.
 Faster operations. Version large binaries.  Built-in WAN optimization and
 the
 freedom to use Git, Perforce or both. Make the move to Perforce.

 http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Two nitrogens in a 5 membered ring

2014-03-03 Thread Toby Wright
Hi,

If I have a five membered ring with 2 consecutive Ns and alternating single
and double bonds expressed by the smiles: N1N=CC=C1 RDKit gives me a
molecule in which every atom is aromatic. If I give it: N1=NC=CC1 it gives
me a molecule in which every atom is aliphatic. If I give it: n1nccc1 it
gives me a kekulization error. I, possibly naively, thought the forms would
be all aromatic or all aliphatic. Am I missing something or is this a bug?

 Chem.MolFromSmiles('N1N=CC=C1').Debug()
Atoms:
0 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0
1 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0
2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
3 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
4 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
Bonds:
0 0-1 order: 12 conj?: 1 aromatic?: 1
1 1-2 order: 12 conj?: 1 aromatic?: 1
2 2-3 order: 12 conj?: 1 aromatic?: 1
3 3-4 order: 12 conj?: 1 aromatic?: 1
4 4-0 order: 12 conj?: 1 aromatic?: 1

 Chem.MolFromSmiles('N1=NC=CC1').Debug()
Atoms:
0 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 0 chi: 0
1 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 0 chi: 0
2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
3 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
4 6 C chg: 0  deg: 2 exp: 2 imp: 2 hyb: 4 arom?: 0 chi: 0
Bonds:
0 0-1 order: 2 conj?: 1 aromatic?: 0
1 1-2 order: 1 conj?: 1 aromatic?: 0
2 2-3 order: 2 conj?: 1 aromatic?: 0
3 3-4 order: 1 conj?: 0 aromatic?: 0
4 4-0 order: 1 conj?: 0 aromatic?: 0

 Chem.MolFromSmiles('n1nccc1').Debug()
[15:31:44] Can't kekulize mol

Yours,

Toby Wright

--
InhibOx Ltd
--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fwd: SMARTS Substructure matching

2014-02-19 Thread Toby Wright
Hi Christos,

If you add hydrogens to m3 after creating it in RDKit then both m1 and m2
are recognised as substructures of m3. See below for how I achieved this:

 from rdkit import Chem
 m1 = Chem.MolFromSmarts([C:3][C:4](=[O:5])[O:6]([H:100]))
 m2 = Chem.MolFromSmarts([C:3][C:4](=[O:5])[O;H:6])
 m3 = Chem.MolFromSmiles(CC(=O)O)
 m3H = Chem.AddHs(m3)
 m3.HasSubstructMatch(m1)
False
 m3H.HasSubstructMatch(m1)
True
 m3.HasSubstructMatch(m2)
True
 m3H.HasSubstructMatch(m2)
True

Hope that helps.

Yours,

Toby Wright

--
InhibOx Ltd, Oxford



On 19 February 2014 10:25, Christos Kannas chriskan...@gmail.com wrote:

 Hi all,

 At my current project I'm working on reaction based multiobjective de novo
 design.
 And I have a set of reactions that I have converted into SMIRKS and
 reaction SMARTS..

 The problem I have is that when I have a reactant pattern in SSMARTS, as
 required by SMIRKS, that has explicit mapped Hydrogens that play a role in
 reaction, and I request a substructure search matching to a compound that
 has the substructure in question it can not find a match. But when I change
 the pattern to not have explicit mapped hydrogens the substructure matching
 search is successful.

 To help you understand I've created this small IPython Notebook
 http://nbviewer.ipython.org/gist/CKannas/9089271

 Can you give me the reasons why this happens?

 Best,
 Christos

 --

 Christos Kannas
 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 --
 Managing the Performance of Cloud-Based Applications
 Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
 Read the Whitepaper.

 http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Possible rotatable bonds replacement

2014-01-31 Thread Toby Wright
Hi,

I favour option 1 but not strongly over option 3. Option 2 is cleanest but
I think the cost to users that expect the existing behaviour is too high. I
don't see much difference in the confusion levels between:
numRotatableBonds() vs numStrictRotatableBonds()
and
numRotatableBonds() vs numRotatableBonds(strict=true)
as neither is truly clean if the user thinks that the two definitions are
interchangable.
The invariant numRotatableBonds(X)=numStrictRotatableBonds(X) holds which
is why I was thinking that one is a strict version of the other, but I'd
welcome a better name for the new function/variable.

Yours,

Toby Wright

--
InhibOx Ltd
Oxford



On 31 January 2014 11:05, JP jeanpaul.ebe...@inhibox.com wrote:

 My 2p worth:

 I am not a big fan of outright replacing the NumRotatableBonds
 implementation (option 2).  This is quite a popular descriptor which is
 used in many ways (e.g. QSAR models, conformer generation, property
 calculation, etc.).  IF we are lucky (or skilful, or have had enough time),
 we have tests written out for everything which will break as soon as soon
 as we get different rotatable bonds count, and different results.  We can
 then revalidate our protocols using the new (strict) rotatable counts.
  Perhaps we get better correlations/enrichments/AUCs etc ! Yeah!

 On the other hand option (1), having two methods NumRotatableBonds() and
 NumStrictRotatableBonds() will lead to some confusion.  Greg has a point
 about different people and/or libraries intermixing between the two.

 Like Paul, I prefer option (3) - with the default behaviour giving the old
 rotatable counts (not strict).  This does not come for free either, as the
 API becomes slightly less clean (and what to do in the future when, for
 example, someone finds a non-SMARTS based way to do this -- add another
 parameter?).  Still I think this is the less of all evils.

 Thanks Toby  Greg!
 JP


 On 31 January 2014 06:54, paul.czodrow...@merckgroup.com wrote:

  I could add the new descriptor as Toby provided it. People are then
  free to pick between NumRotatableBonds() and NumStrictRotatableBonds
  (). This has the advantage of maintaining strict backwards
  compatibility, but I could imagine it being confusing/irritating to
  people using the code to have to choose between them (or, worse, using
 both).
 
  Another option is to just replace the current NumRotatableBonds()
  SMARTS with the new one.
  This loses backwards compatibility, but replaces NumRotableBonds()
  with something more correct.
 
  Finally, I could take a hybrid approach: replace the default
  NumRotatableBonds() with the new one, but add an extra argument that
  allows the old one to be used.

 
  I'm leaning towards the second option. I'd normally go with the
  third, but I almost view this as a bug fix for the rotatable bonds
 definition.
 
  Comments? suggestions? Other options?

 I like your idea of your hybrid approach which would mean backwards
 compatibility.


 paul



 This message and any attachment are confidential and may be privileged or
 otherwise protected from disclosure. If you are not the intended recipient,
 you must not copy this message or attachment or disclose the contents to
 any other person. If you have received this transmission in error, please
 notify the sender immediately and delete the message and any attachment
 from your system. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not accept liability for any omissions or errors in this
 message which may arise as a result of E-Mail-transmission or for damages
 resulting from any unauthorized changes of the content of this message and
 any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not guarantee that this message is free of viruses and does
 not accept liability for any damages caused by any virus transmitted
 therewith.

 Click http://www.merckgroup.com/disclaimer to access the German, French,
 Spanish and Portuguese versions of this disclaimer.


 --
 WatchGuard Dimension instantly turns raw network data into actionable
 security intelligence. It gives you real-time visual feedback on key
 security issues and trends.  Skip the complicated setup - simply import
 a virtual appliance and go from zero to informed in seconds.

 http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 WatchGuard Dimension instantly turns raw network data into actionable
 security intelligence. It gives you real-time visual feedback on key
 security issues and trends.  Skip the complicated setup - simply import
 a virtual appliance and go from zero to informed in seconds.

 http

Re: [Rdkit-discuss] Counting amide groups in rotatable bond counts

2014-01-03 Thread Toby Wright
Hi,

Sorry for the extremely slow reply, thanks for the insights and I hope you
all had a excellent Christmas break.

I think the best thing is to roll my own definition of a bond which would
be rotatable if not for the fact that it's an amide. Something like
$([NH]!D1)-!@C=O and then take that number of these bonds away from
RDKit's rotatable bond count. If you simply take away the number of amides
calculated by RDKit from the number of rotatable bonds you hit an error
where an amide bond was not considered rotatable in the first place (for
example because the N was terminal).

The Chemaxon definition of rotatable bonds troubles me somewhat. Given the
following molecule:
CC(=O)NCC
it claims there are no rotatable bonds at all. The non-amide N-C bond is
discounted because one of the atoms fulfils the pattern ([NH]!@C(=O)), that
is it is a N connected to a C=O group, even though this connection is not
made by the N-C bond in question.

Thanks again,

Yours,

Toby Wright

--
InhibOx Ltd
Oxford


On 24 December 2013 15:46, Gerebtzoff, Gregori gregori.gerebtz...@roche.com
 wrote:

 Hi Toby,

 One additional note on what Greg wrote:
 you can define another smarts pattern for the identification of rotatable
 bonds:
 Lipinski.RotatableBondSmarts = Chem.MolFromSmarts(...)

 Some smarts from the literature:
 Daylight:
 [!$(*#*)!D1$(*(-[!#1])~[!#1])]-!@[!$(*#*)!D1$(*(-[!#1])~[!#1])]
 Chemaxon: [!$([NH]!@C(=O))!D1!$(*#*)]-!@[!$([NH]!@C(=O))!D1!$(*#*)]

 Best,

 Grégori



 --
 Rapidly troubleshoot problems before they affect your business. Most IT
 organizations don't have a clear picture of how application performance
 affects their revenue. With AppDynamics, you get 100% visibility into your
 Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics
 Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom equivalence for substructure matching

2013-10-30 Thread Toby Wright
While this doesn't answer your core question of can RDKit do what you want
without manually editing the smarts strings, if you do end up hacking it
using 'C[CX3v4](~O)~O' might be cleaner than 'CC(~O)~O)' as it would
exclude the case where both Os were singly bonded.

Yours,

Toby Wright

--
InhibOx Ltd


On 30 October 2013 01:12, S.L. Chan slch...@yahoo.com wrote:

 Good evening,

 I would like to get an exhaustive substructure matching of a molecule onto
 itself. Generally I could use the GetSubstructMatches function with the
 uniquify=False option. However, if there is a carboxylate or a
 guanidinium head around, this would give only one side of the match since
 the two oxygens / nitrogens are not considered equivalent:

  mol = Chem.MolFromSmiles('CC(=O)[O-]')
  patt = Chem.MolFromSmarts('CC(=O)[O-]')
  print mol.GetSubstructMatches(patt,uniquify=False)
 ((0,1,2,3),)

 Now, I suppose I could do an ugly (could in principle match two single
 bonds) hack to achieve my purpose:
  mol = Chem.MolFromSmiles('CC(=O)[O-]')
  patt = Chem.MolFromSmarts('CC(~O)~O')
  print mol.GetSubstructMatches(patt,uniquify=False)
 ((0,1,2,3), (0,1,3,2))

 However, this would mean that I would need to manually edit the smarts
 string for all molecules. I just wonder if there is something similar to
 the Kekulize command that would make the two oxygens equivalent? Or are
 there other ways around this?

 Ling


 --
 Android is increasing in popularity, but the open development platform that
 developers love is also attractive to malware creators. Download this white
 paper to learn more about secure code signing practices that can help keep
 Android apps secure.
 http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Surprising DeleteSubstructs(smiles, smiles) behaviour

2013-10-09 Thread Toby Wright
Hi,

I just ran into a small gotcha and thought I'd share it. I have a molecule
with a fragment of 6 carbons, 5 of which form a ring, and I am deleting
fragments that match CC. I thought that if I were working in SMILES
the ring fragment would be spared, but not if it was a SMARTS. However as
the following code shows it gets deleted either way.

 import rdkit
 from rdkit import Chem
 query = Chem.MolFromSmiles('C.CC11')
 remove_as_smiles = Chem.MolFromSmiles('CC')
 remove_as_smarts = Chem.MolFromSmarts('CC')
 print Chem.MolToSmiles(Chem.DeleteSubstructs(query, remove_as_smiles,
onlyFrags=True))
C
 print Chem.MolToSmiles(Chem.DeleteSubstructs(query, remove_as_smarts,
onlyFrags=True))
C

So now I know to use [C!r][C!r][C!r][C!r][C!r][C!r] explicitly if that's
what I mean. Hope this saves someone else from stumbling into my mistakes.

Yours,

Toby Wright

--
InhibOx Ltd
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Inconsistancy across elements in making Hs explicit

2013-09-27 Thread Toby Wright
Hi,

I've observed an odd behaviour in RDKit with listing explicit hydrogens in
smiles where the original molecules were generated from SD files. As the
code below shows if I ask What is the smiles for a single C atom? I get
C but if I ask for silicon I get [SiH4]. Any reason why this might be?
I've also observed that in RDKit 2013 Q2 I get [Fe] as the smiles from a
single iron atom, but in RDKit 2011 Q4 I get [FeH6] but I can't see
anything in the release notes to explain this change. I also have examples
involving atoms in larger molecules but I thought these provided the
simplest examples.

Example files and interactive python snippet:

 sup = Chem.SDMolSupplier(SingleSi.sdf)
 sup2 = Chem.SDMolSupplier(SingleC.sdf)
 print Chem.MolToSmiles(sup[0], canonical=True, isomericSmiles=True)
[SiH4]
 print Chem.MolToSmiles(sup2[0], canonical=True, isomericSmiles=True)
C

SingleC.sdf:
SingleC
 RDKit  2D

  1  0  0  0  0  0  0  0  0  0999 V2000
0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
M  END


SingleSi.sdf:
SingleSi
 RDKit  2D

  1  0  0  0  0  0  0  0  0  0999 V2000
0.0.0. Si  0  0  0  0  0  0  0  0  0  0  0  0
M  END


Thanks,

Toby

--
InhibOx Ltd
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Can't read SDF data lines when CTAB is in V3000 format

2013-08-21 Thread Toby Wright
Hi,

I'm trying to read the data lines from an SD file where the CTAB is in
V3000 format. If the file v3000propIssue.sdf contains the following:
testMol


  0  0  0 0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 1 0 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 0 0 0 0
M  V30 END ATOM
M  V30 END CTAB
M  END
  TestProp
42



then it is read by an SDMolSupplier it loads correctly (as shown by the
Debug) apart from the data lines which are not converted to RDKit
properties as the following interactive code snippet show:

 import rdkit
 from rdkit import Chem
 mol = Chem.SDMolSupplier(v3000propIssue.sdf).next()
[10:59:33] ERROR: Problems encountered parsing data fields
[10:59:33] ERROR: moving to the begining of the next molecule
 mol.HasProp(_Name)
1
 mol.HasProp(TestProp)
0
 mol.Debug()
Atoms:
0 6 C chg: 0  deg: 0 exp: 0 imp: 4 hyb: 4 arom?: 0 chi: 0
Bonds:

Any ideas why this might be?

Yours,

Toby Wright

--
InhibOx Ltd
--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Chirality lost unless molecule sanitized on load

2013-08-14 Thread Toby Wright
Hi,

I think the following behaviour is a bug but feel free to correct me. I
have an SD file (attached) with two stereoisomers of alanine (built by
openbabel from the smiles). I want to read it and write it's contents as
isomeric smiles. I execute the following:

import rdkit
from rdkit import Chem

smiles_writer = Chem.SmilesWriter(ChiralTest.smi, includeHeader=False,
isomericSmiles=True)
suppl = Chem.SDMolSupplier(ChiralTest3D.sdf, sanitize=False)
for mol in suppl:
Chem.SanitizeMol(mol)
smiles_writer.write(mol)

smiles_writer.flush()
smiles_writer.close()

smiles_writer2 = Chem.SmilesWriter(ChiralTest2.smi, includeHeader=False,
isomericSmiles=True)
suppl2 = Chem.SDMolSupplier(ChiralTest3D.sdf, sanitize=True)
for mol in suppl2:
smiles_writer2.write(mol)

smiles_writer2.flush()
smiles_writer2.close()

The file ChiralTest.smi now contains:
[H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] L-alanine
[H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] D-alanine

and ChiralTest2.smi contains:
C[C@H](N)C(=O)O L-alanine
C[C@@H](N)C(=O)O D-alanine


My question is why do I get different outputs depending on when
sanitization was performed?

Yours,

Toby Wright

--
InhibOx Ltd


ChiralTest3D.sdf
Description: Binary data
--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with 2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Minor bug in Data/Crippen.txt

2011-11-04 Thread Toby Wright
Hi,

On line 11 of the file Data/Crippen.txt the label says C2 but the
SMARTS expression, log p and MR values are as expected for case C3
from WildmanCrippen '99, which suggests that the thing wrong is
simply the label.

I also have a question that might be a bit foolish as I'm not an
accomplished chemist, but does the SMARTS for O11 deal correctly when
the Oxygen in question is bonded to aromatic atoms? If I understand
correctly it should match either aromatic or aliphatic elements (apart
from the Carbon), but the SMARTS as written will only match in the
aliphatic case.

Yours,

Toby Wright

--
inhibOx

--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss