Re: [Rdkit-discuss] Hello questions about the Synthetic Accessibility score

2020-11-15 Thread Peter Gedeck
The paper is pretty vague on implementation details. However, note that the 
code is copyright Novartis Institutes for BioMedical Research Inc. It was 
released in the public domain and at that point (2013) it was the 
implementation that was used internally at Novartis. You can therefore use the 
Python implementation in RDKit as the reference for this method. I would not 
spend any more time on finding the discrepancy. 

Best,

Peter


> On Nov 15, 2020, at 11:01 AM, Gustavo Seabra  wrote:
> 
> So, basically,  your code perfectly reproduces RDKit's Python implementation. 
>  However, those results (both yours and RDKit's) *do not* match the original 
> paper. 
> 
> It foes look like a constant shift, but it is not: Some molecules have a 
> different shift than others.
> 
> Questions:
> 
> 1. Are those the same molecules as in the original paper?
> 2. How well defined are the equations in the original paper?
> 
> I'm guessing the RDKit's implementation is *not* 100% the same as in the 
> original paper,  as is stated in the guthub page 
> (https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py 
> )
> 
> # several small modifications to the original paper are included
> # particularly slightly different formula for marocyclic penalty
> # and taking into account also molecule symmetry (fingerprint density)
> 
> 
> --
> Gustavo Seabra
> From: Steven Pak 
> Sent: Saturday, November 14, 2020 12:20:47 PM
> To: Greg Landrum 
> Cc: rdkit-discuss@lists.sourceforge.net 
> Subject: Re: [Rdkit-discuss] Hello questions about the Synthetic 
> Accessibility score
>  
> Blue dots are RDKit-based python code vs My CPP implementation code. Orange 
> dots are My CPP implementation code vs scores extracted from the original 
> paper ( Estimation of synthetic accessibility score of drug-like molecules 
> based on molecular complexity and fragment contributions). My CPP 
> implementation of the SA_score is based on the python version of RDKIT. I am 
> trying to match the values exactly the same as the RDKit version (which 
> appears to be working). That is why I am a bit confused about why the orange 
> dots appear to shift at a constant value. I am wondering as to why it shifts 
> like that. 
> 
> As for the open source comment, I will let you know. I also did the same 
> thing for QED scoring functions, and I have a couple of questions about that 
> too, which I will send an email soon. I must talk to my team about this 
> before we could step forward. 
> 
> Thanks!
> 
> On Sat, Nov 14, 2020 at 2:29 AM Greg Landrum  > wrote:
> Steven, 
> 
> Wow cool! Any thoughts about making that implementation open source?
> 
> Did you recalculate the Python SA score with the same version of the RDKit 
> you used for the CPP version? Did you do your implementation based on the 
> Python code (hopefully) or the algorithm description in the paper?
> 
> If the answer to both those questionsthat is “yes”, then I’m going to guess 
> we’d need to see the code to diagnose the problem
> 
> Best,
> -greg
> 
> On Sat, 14 Nov 2020 at 00:06, Steven Pak  > wrote:
> Hello.
> 
> I have been working on a CPP version of SA score. Results are fantastic! 
> 
> As you can see in the image, the blue dots represent the SA_scores from 
> python vs scores from my CPP version. The scores are perfectly in line with 
> each other, which is great! However, for the orange dots, these are the 
> values from RDKit vs original paper's. These are the original 40 compounds 
> that I found in the original paper. I was just wondering why do the orange 
> dots seem to have a constant shift throughout the graph? What part of the 
> code was changed to have caused this? I am just curious. 
> 
> Thank you,
> -- 
> Steven Pak Pharm.D
> Ph.D Student | Rizzo Lab
> Stony Brook University (SUNY)
> Department of Pharmacological Sciences
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> 
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> 
> 
> 
> -- 
> Steven Pak Pharm.D
> Ph.D Student | Rizzo Lab
> Stony Brook University (SUNY)
> Department of Pharmacological Sciences
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Programatic access to the mol sanitation process results

2018-03-09 Thread Peter Gedeck
Hello Lukas,

The file rdkit/TestRunner.py contains a class/context manager called 
OutputRedirectC. If I remember correctly, this allowed capturing these 
messages. It's not used anywhere in the RDkit code base, so it not work 
anymore. Anyway, give it a try and if it works, you can modify it to redirect 
the output into a variable or StringIO. 

Best,

Peter


> On 9 Mar 2018, at 9:34 AM, Lukas Pravda  wrote:
> 
> Hello Greg, 
>  
> I’m very sorry for the late reply. Thank you for the hint on disabling the 
> log message, it works on my end. However, I was more interested in catching 
> the other bit i.e. which part of the structure is wrong, rather than which 
> part of the sanitization process failed. That is accessing the message 
> ‘Explicit valence for atom # 1 O, 3, is greater than permitted’ in form to 
> find out that it is the misbehaving oxygen which causes failure of the 
> sanitization process. Perhaps piping the log information into a variable or 
> something like that.
>  
> Best,
> Lukas
>  
>  
>  
> From: Greg Landrum >
> Date: Thursday, 22 February 2018 at 13:32
> To: Lukas Pravda >
> Cc: RDKit Discuss  >
> Subject: Re: [Rdkit-discuss] Programatic access to the mol sanitation process 
> results
>  
> Hi Lukas,
>  
> On Thu, Feb 22, 2018 at 1:14 PM, Lukas Pravda  > wrote:
>> Dear rdkiters,
>>  
>> I’m constructing molecules from scratch using python 3.5.4 and RDKit 
>> 2017.09.2 and due to the variety of reasons some of them are violating 
>> general principles of chemistry in a way implemented in rdkit, so I’m 
>> getting information like:
>>  
>> Explicit valence for atom # 14 N, 4, is greater than permitted etc.
>>  
>> I wonder if there is a way how to retrieve this piece of information in a 
>> programmatic way. In order to work with it. Presently, rdkit only prints 
>> this out into terminal and Chem.SanitizeMol() only returns first 
>> sanitization flag with the issue. Ideally, I’d like no information to be 
>> printed into console, while keeping the log info ‘Explicit valence for atom 
>> # 14 N, 4, is greater than permitted’ preferably in a structured way (in a 
>> property/method?), in order to further deal with those erroneous cases.
>  
> At last part of this is pretty straightforward.
>  
> There are two parts: 
> - making it so error messages don't go to the console 
> - capturing the failed operation.
>  
> The first is a bit fragile (i.e. doesn't always work), so you will sometimes 
> end up still seeing error messages (as here), but the second should be 
> reliable:
>  
> In [30]: rdBase.DisableLog('rdApp.*')
>  
> In [31]: m = Chem.MolFromSmiles('c11',sanitize=False)
>  
> In [32]: Chem.SanitizeMol(m,catchErrors=True)
> [14:29:37] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4
>  
> Out[32]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_KEKULIZE
>  
> In [35]: 
> Chem.SanitizeMol(Chem.MolFromSmiles('CO(C)C',sanitize=False),catchErrors=True)
> [14:31:37] Explicit valence for atom # 1 O, 3, is greater than permitted
> Out[35]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_PROPERTIES
>  
>  
> You can see that the return value indicates what went wrong in the 
> sanitization.
>  
> I hope this helps,
> -greg
>  
>  
>  
>  
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org ! 
> http://sdm.link/slashdot___ 
> 
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> 
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mmpdb installation on windows using mingw

2017-09-22 Thread Peter Gedeck
Here is a relevant stackoverflow question.

https://stackoverflow.com/questions/1948862/is-the-python-3-x-signal-library-for-windows-incomplete

What happens if you comment out the code if you run on windows?

Best

Peter
On Fri, Sep 22, 2017 at 7:25 AM Markus Metz  wrote:

> Hello Christian:
>
> I am trying to install your program and get the following error message:
>
> $ mmpdb help-analysis
> Traceback (most recent call last):
>   File "C:/Users/---/Anaconda3/envs/my-rdkit-env/Scripts/mmpdb", line 8,
> in 
> signal.signal(signal.SIGPIPE, signal.SIG_DFL) # Allow the output pipe
> to be closed
> AttributeError: module 'signal' has no attribute 'SIGPIPE'
>
> Not sure what to do about it. Any input would be greatly appreciated.
>
> Cheers,
> Markus
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] FindAtomEnvironmentOfRadiusN

2017-03-27 Thread Peter Gedeck
Hello,

The atom numbers start with 0. From the middle atom, there are no
environments with radius 2. You will get a result if you use the first (=0)
or the last (=2) atom. Try this:

m = Chem.MolFromSmiles("NCO")
i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 0)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))
i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 0)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))
i = Chem.FindAtomEnvironmentOfRadiusN(m, 3, 0)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))

and you will get:

'CN'
'NCO'
''

Is this more intuitive to you?

Best,

Peter


On Mon, Mar 27, 2017 at 9:35 AM Pavel Polishchuk 
wrote:

> Dear RDKitters,
>
>I found the issue with FindAtomEnvironmentOfRadiusN but this can be a
> feature. However, I did not findthis information in help and did not
> expect such behavior.
>If I apply FindAtomEnvironmentOfRadiusN function to a small molecule
> and specify the radius greater than the size of the molecule the
> function returns empty list of bond indices (and empty mol).
>
> m = Chem.MolFromSmiles("NCO")
>
> i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 1)
> Chem.MolToSmiles(Chem.PathToSubmol(m, i))
>
> returns "NCO"
>
> i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 1)
> Chem.MolToSmiles(Chem.PathToSubmol(m, i))
>
> returns ""
>
>In the latter case I expected the same output "NCO". Were my
> expectations mistaken?
>
> Kind regards,
> Pavel.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Drawing structure with generic labels

2017-02-16 Thread Peter Gedeck
Hello Alexis,

I had a look at the python and the C++ code for drawing of molecules.
Neither supports your requirement. It would be useful to implement it. I
can have a look at it in more detail and see if I could implement a quick
fix, e.g. Drawing a custom label based on an atom property.

Best

Peter
On Thu, Feb 16, 2017 at 7:51 AM Alexis Parenty <
alexis.parenty.h...@gmail.com> wrote:

> Hi everyone,
>
>
> Is it possible to draw a structure from a SMARTS that contain generic
> label?
>
>
> The following is a valid SMARTS for a structure with an undefined
> heteroatom [N,O,P,S] and an undefined halogen [F,Cl,Br,I]:
>
>
> [#7,#8,#15,#16]=C(CC1=CC([#9,#17,#35,#53])=CC=C1)[#7,#8,#15,#16]
>
>
> [image: Inline images 1]
>
>
> For some reason the rdkit function Draw.MolToImage(mol) only peaks the
> first atom from the list of heteroatoms or from the list of halogens and
> returns:
>
>
> [image: Inline images 2]
>
>
>
> I would be even happier if I could get a structure image with  “X” for
> halogen and “Q” for heteroatoms:
>
>
> [image: Inline images 3]
>
>
> Thanks,
>
>
> Alexis
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Peter Gedeck
According to this:
https://en.wikipedia.org/wiki/List_of_moments_of_inertia
The moments of inertia of a disk (something like benzene) are:

Iz = mr^2/2
Ix = Iy = mr^2/4

None of them is zero. The smallest moment of inertia of a rod-like molecule
(e.g. C#C) is zero.

Best,

Peter



On Sun, Jan 15, 2017 at 8:15 AM Greg Landrum  wrote:

> Hi Guillaume,
>
> I think it this case it's something else. According to the Todeschini
> article the smallest moment of inertia of a planar molecule like benzene
> should be zero. The eigenvalues of the inertia matrix for benzene, however,
> are definitely not zero (and not close enough that it's likely to be
> round-off error).
> It would be very nice if you could run the three files I mention through
> Dragon and let me know what it calculates for those descriptors.
>
> -greg
>
>
> _
> From: Guillaume GODIN 
> Sent: Sunday, January 15, 2017 1:11 PM
> Subject: RE: [Rdkit-discuss] PMI API
> To: Greg Landrum , RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>, Chris Earnshaw  >
>
>
>
> Dear Greg,
>
>
> I  suspect that it's a precision error or eigen algorithm shift between
> rdkit c++ & dragon.
>
>
> To obtain good value, I suggest to try to implement a test on the eigen
> values like i did in gateway.cpp implementation.
>
>
>
> JacobiSVD getSVD(MatrixXd A) {
>
> JacobiSVD mysvd(A,  ComputeThinU | ComputeThinV);
>
> return mysvd;
>
> }
>
>
> // get the A-1 matrix using
>
> MatrixXd GetPinv(MatrixXd A){
>
> JacobiSVD svd = getSVD(A);
>
> double  pinvtoler=1.e-2;// choose your tolerance wisely!
>
> VectorXd vs=svd.singularValues();
>
> VectorXd vsinv=svd.singularValues();
>
>
> for (unsignedint i=0; i
> if ( vs(i) > pinvtoler )
>
>vsinv(i)=1.0/vs(i);
>
>else vsinv(i)=0.0;
>
> }
>
>
> MatrixXd S =  vsinv.asDiagonal();
>
> MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose();
>
> return Ap;
>
> }
>
>
> If it's not solve the problem, I would like to test it in Matlab. can you
> provide me the 3 (3d xyz matrix) of your example please ?
>
>
> I also have Dragon 6
>
>
> best regards,
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
> --
> *De :* Greg Landrum 
> *Envoyé :* dimanche 15 janvier 2017 11:50
> *À :* Chris Earnshaw; RDKit Discuss
> *Objet :* Re: [Rdkit-discuss] PMI API
>
> I managed to make some time to look into this this weekend and I've found
> a bug and something I don't understand. Hopefully the community can help
> out here.
> On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw 
> wrote:
>
> 4) The big one! The returned results look very odd. They appear to relate
> more to the dimensions of the molecule than the moments of inertia. For a
> rod-like molecule (dimethylacetylene) I'd expect two large and one small
> PMI (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828
> NPR2: 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:
> 25.3699  NPR1: 0.002430  NPR2: 0.002430.
> For disk-like (benzene) the result should be one large and two medium
> (e.g. PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2:
> 0.500013) but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1:
> 2.14213e-11  NPR2: 0.33.
> Finally for a roughly spherical molecule (neopentane) the NPR values look
> reasonable (no great surprise) but the absolute PMI values may be too
> small: old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2:
> 6.59488  PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
>
>
> Your expectations are correct: the current RDKit implementation is wrong.
> The corresponding github entry is here:
> https://github.com/rdkit/rdkit/issues/1262
> This is due to a mistake in the way the principal moments are calculated
> (which is due to the fact that I don't spend a lot of time working
> with/thinking about 3D descriptors). Instead of using the
> eigenvectors/eigenvalues of the inertia matrix (the tensor of inertia) the
> RDKit is currently using the covariance matrix. There's some more on the
> relationship between these two here:
> http://number-none.com/blow/inertia/deriving_i.html
>
> The problem is easy to fix (and I have something working here:
> https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws
> up the values of the descriptors that are derived from here:
> Todeschini and Consoni "Descriptors from Molecular Geometry" Handbook of
> 

Re: [Rdkit-discuss] SetAtomAlias

2016-12-17 Thread Peter Gedeck
Hello

I tried it with the master branch. The function was added August 10, so
maybe too late for the current release. That commit added functions to
get/set atom specific MDL features (RLabel, atom alias, atom value).

Best

Peter
On Sat, Dec 17, 2016 at 7:47 AM Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Dear Peter,
>
> I got:
>
> AttributeError: 'module' object has no attribute 'SetAtomAlias'
>
> with your example code, below.
>
> Best regards,
>
> Jean-Marc
>
>
> Le 17/12/2016 à 00:44, Peter Gedeck a écrit :
>
> Hello,
>
> SetMolAlias is available in Python as a function and not as an Atom method:
>
> from rdkit import Chem
> import sys
> m = Chem.MolFromSmiles('CCC')
> for i, atom in enumerate(m.GetAtoms()):
>   Chem.SetAtomAlias(atom, 'C' + str(i + 1))
>  w = Chem.SDWriter(sys.stdout)
>  w.write(m)
>  w.close()
>
> Best,
>
> Peter
>
>
> On Fri, Dec 16, 2016 at 5:31 PM Paolo Tosco <paolo.to...@unito.it> wrote:
>
> Dear Jean-Marc,
>
> here:
>
>
> https://gist.github.com/ptosco/6e4468350f0fff183e4507ef24f092a1#file-pdb_atom_names-ipynb
>
>
> there's an example how to use the atom aliases in RDKit.
>
> Cheers,
> p.
>
>
> On 12/16/2016 10:26 PM, Jean-Marc Nuzillard wrote:
> > Hi all,
> >
> > I try add labels to atoms in a molecule, so that lines like
> >
> > A1
> > C12
> > A2
> > C3
> >
> > are written when the molecule is written in a SD file.
> >
> > Considering atom a and alias text txt,
> > I expected the function call SetAtomAlias(a, txt) to do the job.
> > I found this function in a documentation page about the rdchem module.
> > So, my script started with
> >
> > from rdkit import Chem
> > from rdkit.Chem import rdchem
> >
> > I got:
> >
> > NameError: name 'SetAtomAlias' is not defined.
> >
> > I guess the solution is trivial.
> > Forgive my ignorance.
> >
> > All the best,
> >
> > Jean-Marc
> >
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> --
> Jean-Marc Nuzillard
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66http://www.univ-reims.fr/ICMR
> http://www.univ-reims.fr/LSD/http://www.univ-reims.fr/LSD/JmnSoft/
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SetAtomAlias

2016-12-16 Thread Peter Gedeck
Hello,

SetMolAlias is available in Python as a function and not as an Atom method:

from rdkit import Chem
import sys
m = Chem.MolFromSmiles('CCC')
for i, atom in enumerate(m.GetAtoms()):
  Chem.SetAtomAlias(atom, 'C' + str(i + 1))
 w = Chem.SDWriter(sys.stdout)
 w.write(m)
 w.close()

Best,

Peter


On Fri, Dec 16, 2016 at 5:31 PM Paolo Tosco  wrote:

> Dear Jean-Marc,
>
> here:
>
>
> https://gist.github.com/ptosco/6e4468350f0fff183e4507ef24f092a1#file-pdb_atom_names-ipynb
>
>
> there's an example how to use the atom aliases in RDKit.
>
> Cheers,
> p.
>
>
> On 12/16/2016 10:26 PM, Jean-Marc Nuzillard wrote:
> > Hi all,
> >
> > I try add labels to atoms in a molecule, so that lines like
> >
> > A1
> > C12
> > A2
> > C3
> >
> > are written when the molecule is written in a SD file.
> >
> > Considering atom a and alias text txt,
> > I expected the function call SetAtomAlias(a, txt) to do the job.
> > I found this function in a documentation page about the rdchem module.
> > So, my script started with
> >
> > from rdkit import Chem
> > from rdkit.Chem import rdchem
> >
> > I got:
> >
> > NameError: name 'SetAtomAlias' is not defined.
> >
> > I guess the solution is trivial.
> > Forgive my ignorance.
> >
> > All the best,
> >
> > Jean-Marc
> >
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Peter Gedeck
Hello Alexis,

Depending on the size of your document, you could consider limit storing
the already tested strings by word length and only memoize shorter words.
SMILES tend to be longer, so everything above a given number of characters
has a higher probability of being a SMILES. Large words probably also
contain a lot of chemical names. They often contain commas (,), so they are
easy to remove quickly.

Best,

Peter

On Fri, Dec 2, 2016 at 5:43 AM Alexis Parenty 
wrote:

> Dear Pavel And Greg,
>
>
>
> Thanks Greg for the regexps link. I’ll use that too.
>
>
> Pavel, I need to track on which document the SMILES are coming from, but I
> will indeed make a set of unique word for each document before looping.
> Thanks!
>
> Best,
>
> Alexis
>
> On 2 December 2016 at 11:21, Pavel  wrote:
>
> Hi, Alexis,
>
>   if you should not track from which document SMILES come, you may just
> combine all words from all document in a list, take only unique words and
> try to test them. Thus, you should not store and check for valid/non-valid
> strings. That would reduce problem complexity as well.
>
> Pavel.
> On 12/02/2016 11:11 AM, Greg Landrum wrote:
>
> An initial start on some regexps that match SMILES is here:
> https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b38da34e22fb
>
> that may also be useful
>
> On Fri, Dec 2, 2016 at 11:07 AM, Alexis Parenty <
> alexis.parenty.h...@gmail.com> wrote:
>
> Hi Markus,
>
>
> Yes! I might discover novel compounds that way!! Would be interesting to
> see how they look like…
>
>
> Good suggestion to also store the words that were correctly identified as
> SMILES. I’ll add that to the script.
>
>
> I also like your “distribution of word” idea. I could safely skip any
> words that occur more than 1% of the time and could try to play around with
> the threshold to find an optimum.
>
>
> I will try every suggestions and will time it to see what is best. I’ll
> keep everyone in the loop and will share the script and results.
>
>
> Thanks,
>
>
> Alexis
>
> On 2 December 2016 at 10:47, Markus Sitzmann 
> wrote:
>
> Hi Alexis,
>
> you may find also so some "novel" compounds by this approach :-).
>
> Whether your tuple solution improves performance strongly depends on the
> content of your text documents and how often they repeat the same words
> again - but my guess would be it will help. Probably the best way is even
> to look at the distribution of words before you feed them to RDKit. You
> should also "memorize" those ones that successfully generated a structure,
> doesn't make sense to do it again, then.
>
> Markus
>
> On Fri, Dec 2, 2016 at 10:21 AM, Maciek Wójcikowski  > wrote:
>
> Hi Alexis,
>
> You may want to filter with some regex strings containing not valid
> characters (i.e. there is small subset of atoms that may be without
> brackets). See "Atoms" section:
> http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
>
> The set might grow pretty quick and may be inefficient, so I'd parse all
> strings passing above filter. Although there will be some false positives
> like "CC" which may occur in text (emails especially).
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2016-12-02 10:11 GMT+01:00 Alexis Parenty :
>
> Dear all,
>
>
> I am looking for a way to extract SMILES scattered in many text documents
> (thousands documents of several pages each).
>
> At the moment, I am thinking to scan each words from the text and try to
> make a mol object from them using Chem.MolFromSmiles() then store the words
> if they return a mol object that is not None.
>
> Can anyone think of a better/quicker way?
>
>
> Would it be worth storing in a tuple any word that do not return a mol
> object from Chem.MolFromSmiles() and exclude them from subsequent search?
>
>
> Something along those lines
>
>
> excluded_set = set()
>
> smiles_list = []
>
> For each_word in text:
>
> If each_word not in excluded_set:
>
> each_word_mol =  Chem.MolFromSmiles(each_word)
>
> if each_word_mol is not None:
>
> smiles_list.append(each_word)
>
>  else:
>
>  excluded_set.add(each_word_mol)
>
>
> Would not searching into that growing tuple take actually more time than
> trying to blindly make a mol object for every word?
>
>
>
> Any suggestion?
>
>
> Many thanks and regards,
>
>
> Alexis
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
> 

Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Peter Gedeck
Is it possible to use the bulk similarity searching functionality for
better performance instead of the list comprehension?

Best,

Peter


On Wed, Nov 23, 2016 at 9:11 AM Greg Landrum  wrote:

No worries.
This, and Anna's question about similarity searching and clustering
illustrate a great opportunity for a tutorial on fingerprints and
similarity searching.

-greg





On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain"  wrote:

Thanks for this,

As a chemist who comes from the “cut and paste” school of scripting I’m
always concerned I’m asking something blindingly obvious

;-)

Chris

On 23 Nov 2016, at 12:36, Greg Landrum  wrote:

[including rdkit-discuss, because it's relevant there and I'm pretty sure
Chris won't mind and the real Pandas experts may have a better answer than
me.]

On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain  wrote:


I quite like storing molecules and associated data in a data frame and I’ve
see that it is possible to use rdkit for substructure searching, it is
possible to also do similarity searching?


It's not built in since there are many possible fingerprints that could be
used.

It's not quite as convenient as the substructure search, but here's a
little demo of what you can do to filter based on similarity:

# Start by adding a fingerprint column:
In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2)
for x in df['ROMol']]

# and now filter:
In [21]: ndf =df[df.apply(lambda x:
DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]

In [23]: len(df)
Out[23]: 1000
In [24]: len(ndf)
Out[24]: 2

-greg


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Trouble compiling and installing on Ubuntu 14.04

2016-10-04 Thread Peter Gedeck
One of the tests says:

> ImportError: libInfoTheory.so.1: cannot open shared object file: No such
file or directory

Did you "make install" and does LD_LIBRARY_PATH contain $RDBASE/lib?

Best,

Peter



On Tue, Oct 4, 2016 at 11:18 AM Philip Adler  wrote:

> Unfortunately David Hall's suggestion has been unsuccessful, and adds new
> failures into the tests.
>
> The output from ctest --debug is here
> 
> .
>
> Best, and thanks to all for helping so far,
>
> Phil
>
> On Mon, Oct 3, 2016 at 11:08 PM, Greg Landrum 
> wrote:
>
> Hmmm, I didn't see anything below that looks odd.
> It's difficult for me to provide any more detailed help at the moment
> since I'm on vacation and don't have access to either my linux machine or
> to a good network connection (to get a proper docker environment set up).
>
> David Hall's suggestion to just specify the python executable for cmake is
> a good one.
> Or you could try setting the alias:
> alias python='/usr/bin/python3'
> in your .bashrc file.
> In either case, deleting the existing build directory and starting from
> scratch (part of David's suggestion) is a good one.
>
> Best,
> -greg
>
>
>
>
> On Mon, Oct 3, 2016 at 7:59 PM, Philip Adler 
> wrote:
>
> Greg (with apologies for the repeat for the benefit of the mailing list
> -gmail is great up until it isn't!),
>
> Please see below,
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Trouble compiling and installing on Ubuntu 14.04

2016-10-03 Thread Peter Gedeck
You can also check the CMakeCache.txt file in the build directory. When I
last compiled for 3.5 on the Mac, I had to correct the PYTHON_INCLUDE_DIR.

Greg, PYTHON_INCLUDE_DIR was incorrectly set after "cmake ..". Executable
and library correctly found.


//Path to a program.
PYTHON_EXECUTABLE:FILEPATH=/Users/peter/miniconda3/bin/python
//Path to a file.
PYTHON_INCLUDE_DIR:PATH=/System/Library/Frameworks/Python.framework/Headers
//Path to a library.
PYTHON_LIBRARY:FILEPATH=/Users/peter/miniconda3/lib/libpython3.5m.dylib


Best,

Peter




On Mon, Oct 3, 2016 at 2:00 PM Philip Adler  wrote:

> Greg (with apologies for the repeat for the benefit of the mailing list
> -gmail is great up until it isn't!),
>
> Please see below,
>
> import _frozen_importlib # frozen
> import imp # builtin
> import sys # builtin
> # installing zipimport hook
> # installed zipimport hook
> # /usr/lib/python3.4/encodings/__pycache__/__init__.cpython-34.pyc matches
> /usr/lib/python3.4/encodings/__init__.py
> # code object from
> '/usr/lib/python3.4/encodings/__pycache__/__init__.cpython-34.pyc'
> # /usr/lib/python3.4/__pycache__/codecs.cpython-34.pyc matches
> /usr/lib/python3.4/codecs.py
> # code object from '/usr/lib/python3.4/__pycache__/codecs.cpython-34.pyc'
> import 'codecs' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa7a14dd8>
> # /usr/lib/python3.4/encodings/__pycache__/aliases.cpython-34.pyc matches
> /usr/lib/python3.4/encodings/aliases.py
> # code object from
> '/usr/lib/python3.4/encodings/__pycache__/aliases.cpython-34.pyc'
> import 'encodings.aliases' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa7a2a908>
> import 'encodings' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa7a149b0>
> # /usr/lib/python3.4/encodings/__pycache__/utf_8.cpython-34.pyc matches
> /usr/lib/python3.4/encodings/utf_8.py
> # code object from
> '/usr/lib/python3.4/encodings/__pycache__/utf_8.cpython-34.pyc'
> import 'encodings.utf_8' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79b75f8>
> # /usr/lib/python3.4/encodings/__pycache__/latin_1.cpython-34.pyc matches
> /usr/lib/python3.4/encodings/latin_1.py
> # code object from
> '/usr/lib/python3.4/encodings/__pycache__/latin_1.cpython-34.pyc'
> import 'encodings.latin_1' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79b9160>
> # /usr/lib/python3.4/__pycache__/io.cpython-34.pyc matches
> /usr/lib/python3.4/io.py
> # code object from '/usr/lib/python3.4/__pycache__/io.cpython-34.pyc'
> # /usr/lib/python3.4/__pycache__/abc.cpython-34.pyc matches
> /usr/lib/python3.4/abc.py
> # code object from '/usr/lib/python3.4/__pycache__/abc.cpython-34.pyc'
> # /usr/lib/python3.4/__pycache__/_weakrefset.cpython-34.pyc matches
> /usr/lib/python3.4/_weakrefset.py
> # code object from
> '/usr/lib/python3.4/__pycache__/_weakrefset.cpython-34.pyc'
> import '_weakrefset' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79b9e48>
> import 'abc' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79b9630>
> import 'io' # <_frozen_importlib.SourceFileLoader object at 0x7f5aa79b9390>
> # /usr/lib/python3.4/__pycache__/site.cpython-34.pyc matches
> /usr/lib/python3.4/site.py
> # code object from '/usr/lib/python3.4/__pycache__/site.cpython-34.pyc'
> # /usr/lib/python3.4/__pycache__/os.cpython-34.pyc matches
> /usr/lib/python3.4/os.py
> # code object from '/usr/lib/python3.4/__pycache__/os.cpython-34.pyc'
> # /usr/lib/python3.4/__pycache__/stat.cpython-34.pyc matches
> /usr/lib/python3.4/stat.py
> # code object from '/usr/lib/python3.4/__pycache__/stat.cpython-34.pyc'
> import 'stat' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79e4390>
> # /usr/lib/python3.4/__pycache__/posixpath.cpython-34.pyc matches
> /usr/lib/python3.4/posixpath.py
> # code object from
> '/usr/lib/python3.4/__pycache__/posixpath.cpython-34.pyc'
> # /usr/lib/python3.4/__pycache__/genericpath.cpython-34.pyc matches
> /usr/lib/python3.4/genericpath.py
> # code object from
> '/usr/lib/python3.4/__pycache__/genericpath.cpython-34.pyc'
> import 'genericpath' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79e5c50>
> import 'posixpath' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79e45f8>
> # /usr/lib/python3.4/__pycache__/_collections_abc.cpython-34.pyc matches
> /usr/lib/python3.4/_collections_abc.py
> # code object from
> '/usr/lib/python3.4/__pycache__/_collections_abc.cpython-34.pyc'
> import '_collections_abc' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79e5b70>
> import 'os' # <_frozen_importlib.SourceFileLoader object at 0x7f5aa79d0be0>
> # /usr/lib/python3.4/__pycache__/_sitebuiltins.cpython-34.pyc matches
> /usr/lib/python3.4/_sitebuiltins.py
> # code object from
> '/usr/lib/python3.4/__pycache__/_sitebuiltins.cpython-34.pyc'
> import '_sitebuiltins' # <_frozen_importlib.SourceFileLoader object at
> 0x7f5aa79d0ba8>
> # /usr/lib/python3.4/__pycache__/sysconfig.cpython-34.pyc matches
> 

Re: [Rdkit-discuss] Trouble compiling and installing on Ubuntu 14.04

2016-10-03 Thread Peter Gedeck
Hello

Python failures are usually an indication of problems with the boost
library. You might pickup libraries for the wrong Python version.

Best

Peter
On Mon, Oct 3, 2016 at 11:06 AM Philip Adler  wrote:

> Dear All,
>
> I am trying to compile rdkit to run with Python3.4 on Ubuntu 14.04 as per
> the instructions at http://www.rdkit.org/docs/Install.html For reasons
> which I don't really want to get into we would prefer to avoid anaconda for
> the time being.
>
> That being said, when I try to test the build, I get a number of errors
> and failed tests (the build does not work when called from Python, which is
> unsurprising).
>
> For reference, the cmake command I issued was:
>
> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -D
> PYTHON_LIBRARY=/usr/lib/python3.4/config-3.4m-x86_64-linux-gnu/
> libpython3.4.so -D PYTHON_INCLUDE_DIR=/usr/include/python3.4/ -D
> PYTHON_EXECUTABLE=/usr/bin/python3.4
> -DBOOST_ROOT=/usr/lib/x86_64-linux-gnu/ ..
>
> The tests which fail are as follows:
>
> The following tests FAILED:
>   5 - pyBV (SEGFAULT)
>   6 - pyDiscreteValueVect (SEGFAULT)
>   7 - pySparseIntVect (SEGFAULT)
>   8 - pyFPB (SEGFAULT)
>  11 - testPyGeometry (SEGFAULT)
>  14 - pyAlignment (Failed)
>  18 - pyForceFieldConstraints (SEGFAULT)
>  20 - pyDistGeom (Failed)
>  33 - pyDepictor (SEGFAULT)
>  45 - pyChemReactions (SEGFAULT)
>  50 - pyFilterCatalog (SEGFAULT)
>  52 - pyFragCatalog (SEGFAULT)
>  54 - pyMolDescriptors (SEGFAULT)
>  57 - pyPartialCharges (SEGFAULT)
>  59 - pyMolTransforms (SEGFAULT)
>  63 - pyForceFieldHelpers (SEGFAULT)
>  65 - pyDistGeom (SEGFAULT)
>  67 - pyMolAlign (SEGFAULT)
>  69 - pyChemicalFeatures (SEGFAULT)
>  71 - pyShapeHelpers (SEGFAULT)
>  73 - pyMolCatalog (SEGFAULT)
>  75 - pyMolDraw2D (SEGFAULT)
>  77 - pyFMCS (SEGFAULT)
>  80 - pyMolHash (SEGFAULT)
>  82 - pyMMPA (SEGFAULT)
>  84 - pyReducedGraphs (SEGFAULT)
>  86 - pySLNParse (SEGFAULT)
>  87 - pyGraphMolWrap (SEGFAULT)
>  88 - pyTestConformerWrap (SEGFAULT)
>  89 - pyTestThreads (SEGFAULT)
>  92 - pyMatCalc (SEGFAULT)
>  93 - pySimDivPickers (SEGFAULT)
>  94 - pyRanker (Failed)
>  96 - pyFeatures (SEGFAULT)
>  97 - pythonTestDbCLI (Failed)
>  98 - pythonTestDirML (Failed)
>  99 - pythonTestDirDataStructs (Failed)
> 101 - pythonTestDirSimDivFilters (Failed)
> 102 - pythonTestDirVLib (Failed)
> 103 - pythonTestDirChem (SEGFAULT)
>
> I must confess I'm a little out of my depth right now, so I don't even
> know where to begin debugging this. Any advice would be greatly appreciated,
>
> Best,
>
> Phil
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Angstroms Hydrogen bonding

2016-09-14 Thread Peter Gedeck
Hello

Here are a few suggestion you can try that may speed up your code.

Instead of GetSubstructMatches, you can use the list of neighbours of each
atom. Here is something that may work, it creates an iterator that will
return all the atoms for bond angles. I did not test it, however it may
give you an idea.

def iterAngles(mol): """ Return an iterator over all angles in molecule """
natoms = mol.GetNumAtoms() # Get the coordinates of all atoms conf =
mol.GetConformer() coords = [conf.GetAtomPosition(i) for i in
range(natoms)] for atom in mol.GetAtoms(): center = coords[atom.GetIdx()]
neighbours = [a.GetIdx() for a in atom.GetNeighbors()] for a1, a2 in
combinations(neighbours, 2): yield coords[a1], center, coords[a2]
The other one is reducing the calls to numpy functions. It is not necessary
to calculate the angle, instead us the cos(angle) value in the comparison.
I also avoid calling the np.norm function. That will call sqrt for each
vector, so we can move it out.

def cangle3(v1, v2): v12 = np.dot(v1, v2) v11v22 = np.dot(v1, v1) *
np.dot(v2, v2) c = v12 / np.sqrt(v11v22) # -> cosine of the angle return
np.clip(c, -1, 1) def angleContribution3(angles): """ Return the angle
contributions """ a60 = a90 = a102 = aUnmatched = 0 cos102 = np.cos(102.0 /
180 * np.pi) for a1, c, a2 in angles: v1 = np.array([a1.x - c.x, a1.y -
c.y, a1.z - c.z]) v2 = np.array([a2.x - c.x, a2.y - c.y, a2.z - c.z])
cosAngle = cangle3(v1, v2) if 0.5 <= cosAngle: a60 += 1 elif 0 <= cosAngle
< 0.5: a90 += 1 elif cos102 <= cosAngle < 0: a102 += 1 else: aUnmatched +=
1 return a60, a90, a102, aUnmatched

I tried this function with a random list of 100,000 vectors and get these
timings:

Your code: 1.6375901699066162
Remove arccos, abs, and degrees: 1.440047025680542
Remove call of lingalg.norm: 0.9320440292358398

The following code gets it down to 0.7 seconds by removing the call to
sqrt. However I think we loose clarity of the code here and documentation
will be crucial.

def cangle4(v1, v2): v12 = np.dot(v1, v2) v11v22 = np.dot(v1, v1) *
np.dot(v2, v2) return np.sign(v12) * (v12 * v12) / v11v22 def
angleContribution4(angles): """ Return the angle contributions """ a60 =
a90 = a102 = aUnmatched = 0 cos102 = np.cos(102.0 / 180 * np.pi) cos102 =
-cos102 * cos102 for a1, c, a2 in angles: v1 = np.array([a1.x - c.x, a1.y -
c.y, a1.z - c.z]) v2 = np.array([a2.x - c.x, a2.y - c.y, a2.z - c.z])
cosAngle = cangle4(v1, v2) if 0.5 * 0.5 <= cosAngle: a60 += 1 elif 0 <=
cosAngle < 0.5 * 0.5: a90 += 1 elif cos102 <= cosAngle < 0: a102 += 1 else:
aUnmatched += 1 return a60, a90, a102, aUnmatched

I'm curious if these changes will speed up your calculation.

Best,

Peter

On Wed, Sep 14, 2016 at 3:18 PM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Dear Greg,
>
>
> I found my mistake I need the bond angle not the torsion.
>
>
> I add this function for that...
>
>
> from numpy import (array, dot, arccos, clip,zeros,degrees)
> from numpy.linalg import norm
>
>
> def cangle(v1,v2):
> c=dot(v1,v2)/norm(v1)/norm(v2) # -> cosine of the angle
> return arccos(clip(c, -1, 1))  # if you really want the angle
>
> # THIS CODE IS WORKING BUT IT'S SLOW!
> def AnglesBond(mol):
> angles = mol.GetSubstructMatches(Chem.MolFromSmarts('*~*~*'))
> conf = mol.GetConformer()
> A60=0
> A90=0
> A102=0
> for ang in angles:
> a1 = conf.GetAtomPosition(ang[0])
> c = conf.GetAtomPosition(ang[1])
> a2 = conf.GetAtomPosition(ang[2])
> v1=array([a1.x-c.x,a1.y-c.y,a1.z-c.z])
> v2=array([a2.x-c.x,a2.y-c.y,a2.z-c.z])
> Angle = abs(degrees(cangle(v1,v2)))
> if Angle<=60:
> A60+=1
> elif Angle>60 and Angle<=90:
> A90+=1
> elif Angle>90 and Angle<=102:
> A102+=1
> return (A60,A90,A102)
>
>
> the stats are not perfect but in progress.
>
>
> ​
>
>
>
> BR,
>
>
> Guillaume
> --
> *De :* Greg Landrum 
> *Envoyé :* mercredi 14 septembre 2016 14:14
>
> *À :* Guillaume GODIN
> *Cc :* RDKit Discuss
> *Objet :* Re: [Rdkit-discuss] Angstroms Hydrogen bonding
>
>
> On Wed, Sep 14, 2016 at 4:16 AM, Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>> Your solution is perfect!
>>
>>
>> glad it worked
>
>
>> I am currently implementing this article "
>> http://www.mdpi.com/1420-3049/20/10/18279; using RDKit.
>>
>>
> Interesting, I hadn't seen that one.
>
>
>
>> It's now almost done, I need to check the results on Heat of Formation
>> now and understand why my code to get Angle in torsions is not working.
>>
>> def Angles(mol):
>> tors =
>> mol.GetSubstructMatches(Chem.MolFromSmarts('[C]-[C;O;S]-[C;O;S]-[C]'))
>> conf = mol.GetConformer()
>> A60=0
>> A90=0
>> A102=0
>> for tor in tors:
>> Angle = abs(AllChem.GetDihedralDeg(conf,tor[0], tor[1], tor[2],
>> tor[3]))
>> if Angle<=60:
>> A60+=1
>> elif Angle>60 and Angle<=90:
>>  

Re: [Rdkit-discuss] Querying when using CTabs

2016-06-06 Thread Peter Gedeck
My solution for the problem was the following:

qmol = Chem.MolFromMolBlock(molblock)
for atom in qmol.GetAtoms():
  if atom.HasQuery():
continue
  atom.SetNumExplicitHs(atom.GetTotalNumHs())

This gives a SMARTS like
this: [#7]1(-[#6](-[#6H2]-[#6,#8]-[#6H](-[#6H2]-1)-[*])=[#8])-[*]

This may be good enough for this specific user, however It doesn't solve
the problem of the [C,O] query atom [#6,#8]. If that is C, it would allow
additional substitution of this atom.

How is your solution handling it?

Best,

Peter




On Tue, Jun 7, 2016 at 1:06 AM Brian Kelley  wrote:

> An interesting conversation came up at work a few days ago regarding
> MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
> tackling some of these issues when it comes to reaction processing .rxn
> based files and plan on contributing it relatively soon.  However, I hadn't
> considered making it a generic Query based sanitization/processing.
>
>
> The basic question was "How do I get a MolBlock to only match the "R"'s
> and not allow substitutions anywhere else? like ChemAxon..."
>
>
> As it turns out, RDKit is very strict when it looks at RGroups.  This was
> the initial issue with when i started Sanitizing RGroups.  Basically there
> are several variants in the wild (ChemDraw/ICM) that make reactions that
> don't quite follow the CTAB spec.  RDKit likes the atom labled R to (1)
> actually be in an "M  RGP" tag and (2) have an atom mapping.  If an atom is
> labeled "R" and not in a R_GRP it isn't considered a wild card for instance.
>
> Now queries don't really care about "M  RGP", but they do care that it
> isn't a dummy atom.  I'm listing below our current technique to fix these
> issues for CTAB queries and would like some feedback.
>
> Here is the workflow that we have been telling chemists during sketching:
>
> 1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough, you
> can replace it with "A", but R has special semantics and needs an RGroup
> label defined.
> 2. aromatize where appropriate
> 3. (optionally) protonate so only RGroups can match
>
> These line up with the following RDKit code snippets:
>
> 1. Fix the "R"s (note we probably should make proper RGroups, but this
> just add dummy matches)
>
> qmol = rdkit.Chem.MolFromMolblock(molblock)
> # first, change the "R"'s into matching any atoms
> from rdkit.Chem import rdqueries
> qmol = Chem.RWMol(qmol)
> for atom in newpat.GetAtoms():
> if atom.GetAtomicNum() == 0:
>qmol.ReplaceAtom(atom.GetIdx(),
> rdqueries.AtomNumGreaterQueryAtom(0))
>
>
> 2. aromatize - this might be good or might break things.  It seems to work
> great, even with conditional logic i.e. [C,O] but I'm unsure which atom is
> actually being used to form the Pi electrons for aromaticity checking.  I
> expect the First actually.  In anycase, something needs to happen in
> general for random inputs, otherwise the matching doesn't really do what is
> expected.
>
> # We want to see if we can find aromaticity, this may be complicated with
> #  query features [C,O] but it works ok.
> Chem.SanitizeMol(qmol, Chem.SANITIZE_SETAROMATICITY)
>
> 3. protonate if the desire is to only match RGroups
>
> # second, add explicit Hs so we only match the Rs
> # I'm unclear if this can fail in general, I would probably wrap this in
> #  a try...except block
> Chem.SanitizeMol(qmol, Chem.SANITIZE_ADJUSTHS)
> qmol = Chem.MergeQueryHs(Chem.AddHs(qmol))
>
> This could be enabled with flags into a SanitizeQuery function, or perhaps
> a PrepareQuery function.
>
> Thoughts?
>
> Cheers,
>  Brian
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] OCN = NCO, and I don't want that.

2016-06-05 Thread Peter Gedeck
Hello

This is the expected behaviour. The path of length 2 creates one fragment
OCN. That fragment is the same if you start from oxygen or from the
nitrogen.

You will get a differentiation of the O and the N if you include paths of
different length. It could also be that substitution can modify the
fingerprint. OCNR and NCOR, starting at the O or the N respectively. That
depends on the definition of the layered fingerprints.

I suggest you use a rooted fingerprint with min path of 0 up to a max path
N. You the try different values of N with your machine learning algorithm
and see how your predictive performance changes.

Best

Peter


On Sat, 4 Jun 2016 at 18:30, Esben Jannik Bjerrum <
esbenjan...@rocketmail.com> wrote:

> Hi RDkitters,
>I'm working on an application where I use fingerprints to analyze the
> local environment around an atom. However I get a bit of trouble with the
> fromAtoms option of e.g. the LayeredFingerPrint, which so far works the
> best in my application. I can illustrate the problem with the following
> script:
>
> from rdkit import Chem
> from rdkit.Chem import rdmolops
>
> mol = Chem.MolFromSmiles('OCN')
>
> #Create 2 fingerprints, each from a different atom.
> fp1 = rdmolops.LayeredFingerprint(mol, minPath=2, maxPath=2,
> fromAtoms=[0], fpSize=16)
> fp2 = rdmolops.LayeredFingerprint(mol, minPath=2, maxPath=2,
> fromAtoms=[2], fpSize=16)
>
> print "fp1 == fp2:", fp1 == fp2
> print "fp1.ToBitString()", fp1.ToBitString()
> print "fp2.ToBitString()", fp2.ToBitString()
>
> #Output
> #fp1 == fp2: True
> #fp1.ToBitString() 00011001
> #fp2.ToBitString() 00011001
>
> The two fingerprints are the same.
>
> However, I would like the path OCN to be different from the path NCO. I
> deliberately set the minPath and MaxPath to 2 to illustrate my point and
> the fingerprints get different due to the OC/NC path if they are not set.
> However, I suspect that the performance of my machine learning diminishes
> due to the "pollution" from the OCN or NCO related bits, depending if I'm
> analyzing the O or the N.
>
> I guess the behavior is on purpose, as it makes no sense that a molecule
> with a OCN  or NCO pattern should  give rise to two different paths and bit
> patterns when the goal is similarity comparison. However in my situation it
> could make a lot of difference as I'm working with a specific atom as
> "root".
>
> I tried to chase the behavior in the cpp code, but got lost somewhere in
> SubGraphs.cpp
>
> I hope  there is a solution that's not too difficult to implement.
>
> best Regards
> Esben Jannik Bjerrum
> cand.pharm, Ph.D
>
> /Sent from my Ubuntu Touch Phone
>
> Phone +45 2823 8009
> http://dk.linkedin.com/in/esbenbjerrum
> http://www.wildcardconsulting.dk
>
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stereochemistry Perception

2016-05-27 Thread Peter Gedeck
Hello Rob

The compound is not chiral. There is a mirror plane that contains the
5-ring and the C-NH3 bond. There is a cis / trans stereoisomers here like
in 1,4-dichloro-cyclohexane. That cannot be defined using the @symbols.
However I cannot tell you how to do this for cases like this in SMILES.

Best

Peter
On Fri, 27 May 2016 at 21:38, Rob Smith  wrote:

> Hi all,
>
> I know there's been a lot of discussion about stereochemistry in RDKit,
> and I don't really want to open the can of worms particularly, but I would
> appreciate gaining a little more understanding to help explain an
> observation I've had (and it could be I've missed something really obvious).
>
> If I run the following Python RDKit code:
>
> from rdkit.Chem import AllChem as Chem
>
> AllMolecules = Chem.MolFromSmiles('CC[C@@]1(NC)CC[C@H](N)CC1.N[C@@H]1CC[C@
> ]2(CC1)CCCN2.N[C@H]1CCC[C@@]2(C1)CCCN2.N[C@H]2[C@@]12CCCN1')
> molecules = Chem.GetMolFrags(AllMolecules, asMols=True)
> for eachMolecule in molecules:
> print(Chem.MolToSmiles(eachMolecule, isomericSmiles=True))
>
> The output I get is:
> CC[C@]1(NC)CC[C@@H](N)CC1
> NC1CCC2(CCCN2)CC1
> N[C@H]1CCC[C@]2(CCCN2)C1
> N[C@H]1[C@]12CCCN2
>
> The second molecule appears to be perceived as having no steroechemistry,
> however when a bond is broken in the pyrollidine to remove the spirocentre
> (the first molecule), the molecule is perceived as having stereochemistry.
> Also moving the pyrollidine spiro centre away from the 4 position of the
> cyclohexyl ring appears to enable the stereocentre to be 'perceived'.
>
> Thanks in advance for your help,
>
> Kind regards,
>
> Rob
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SetProp behavior

2016-01-17 Thread Peter Gedeck
Hello

To change properties of a molecule, it is not necessary to convert to a
RWMol. This is required only if you want to modify the structure.

molsin2[0].GetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT')
molsin2[0].SetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT', str(5))
molsin2[0].GetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT')

The statement Chem.RWMol(mol) instantiates an object that is a copy of the
original molecule.

Chem.RWMol(molsin2[0]).SetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT', str(5))

The object is created and SetProp called for this new object. After that
statement, nothing references the new RWMol object and it therefore will go
to garbage collection. Therefore corresponds to the following code

xx = Chem.RWMol(molsin2[0])
xx.SetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT', str(5))
del xx

I hope this clarifies your question.

Best

Peter



On Mon, 18 Jan 2016 at 4:24 AM chris dalton  wrote:

> Hi,
> I am changing the value of a property in a mol object in a SD supplier and
> I see the results below. I can only change the value of the property when I
> make an 'intermediate' variable. I don't understand why this is the case
> and is there a way I can directly change the value in the molecules in the
> supplier without doing this intermediate step?
>
> thanks
> Chris
>
> Chem.RWMol(molsin2[0]).GetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT')
> '0'
> Chem.RWMol(molsin2[0]).SetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT', str(5))
> Chem.RWMol(molsin2[0]).GetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT')
> '0'
>
> xx = Chem.RWMol(molsin2[0])
> xx.GetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT')
> '0'
> xx.SetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT', str(5))
> xx.GetProp('PUBCHEM_ATOM_DEF_STEREO_COUNT')
> '5'
>
> Chem.RWMol(molsin2[0])
> 
> >>> xx
> 
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] trouble with SMARTs interpretation of 'not hydrogen'

2015-09-16 Thread Peter Gedeck
Hello

This may be just an example that you picked out of many, however why don't
you just make this atom an 'any atom'? It's in a ring and normally hydrogen
don't come up in rings.

Best

Peter
On Thu, 17 Sep 2015 at 6:24 am, Andrew Dalke 
wrote:

> On Sep 16, 2015, at 9:57 PM, Bodle, Christopher R wrote:
> > I am having trouble with RDKit correctly interpreting the SMARTS
> character [!#1], which should be interpreted as "any atom not hydrogen.
>
> I've been looking at your emails but it's difficult for me to figure out
> what you are doing. Can you generate a smaller reproducible?
>
> My guess is that you are looking at the RDKit depiction of a molecule
> generated from a SMARTS string.This is a query molecule. As I recall,
> this is incomplete, and there is an open call out for someone interested in
> generating a better query depiction. If that's the case, then what you see
> is inability of the renderer to display a "not". This shouldn't affect the
> ability to match a molecule.
>
> I also don't understand this:
>
> > My SMARTS input:
> > [#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]
> >
> > Now when I do Chem.MolFromSmarts, my mol representation has hydrogens at
> those three positions, and as such I can't do sanitization of the molecule
> because since it has hydrogens in the !#1 positions, there is a valency
> conflict.
>
> It doesn't make sense to me to do sanitization of molecule that came from
> a SMARTS query.
>
> It looks like you have tried to convert a query-based molecule into a more
> chemical molecule. That is, I can reproduce some of what you report by
> using:
>
>   >>> from rdkit import Chem
>   >>> mol =
> Chem.MolFromSmarts("[#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]")
>   >>> Chem.MolToSmiles(mol)
>   '[H]N1[H]=[H][H]=C(C#N)C1=S'
>
> This produces a nearly meaningless conversion. For example, consider:
>
>   >>> mol = Chem.MolFromSmarts("[#92,#93][$(N=N)]")
>   >>> Chem.MolToSmiles(mol)
>   '[*][U]'
>   >>> mol = Chem.MolFromSmarts("[#93,#92][$(N=N)]")
>   >>> Chem.MolToSmiles(mol)
>   '[*][Np]'
>
> When there is a choice of atoms, it picks the first, given 'U' and 'Np'
> when I swap the two element numbers. And it shows a recursive SMARTS as a
> '*'.
>
> As far as I can tell, the "[!#1]" works correctly. Here's a case where it
> matches an 'N':
>
>   >>> pat = Chem.MolFromSmarts("C-[!#1]-C")
>
>   >>> mol = Chem.MolFromSmiles("CNC")
>   >>> mol.HasSubstructMatch(pat)
>   True
>
> RDKit won't parse a 2-valent hydrogen by default:
>
>   >>> mol = Chem.MolFromSmiles("C[H]C")
>   [00:15:07] Explicit valence for atom # 1 H, 2, is greater than permitted
>
> but if I disable sanitization, I can show that the pattern doesn't match
> this molecule:
>
>   >>> mol = Chem.MolFromSmiles("C[H]C", sanitize=False)
>   >>> mol.HasSubstructMatch(pat)
>   False
>
> And to double-check that the sanitize flag isn't doing something odd:
>
>   >>> mol = Chem.MolFromSmiles("C[N]C", sanitize=False)
>   >>> mol.HasSubstructMatch(pat)
>   True
>
> Since the SMARTS pattern doesn't work for you, but does seem to work for
> me, could you give a test case which is just the SMILES/SMARTS or
> molfile/SMARTS combination which gives the failure? That is, without the
> incomplete scaffolding that you showed.
>
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> --
> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> Get real-time metrics from all of your servers, apps and tools
> in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Errors while running CTest

2014-09-24 Thread Peter Gedeck
Hello,

Running the tests creates the directory Testing/Temporary which contains a
file LastTest.log. This file is the actual output from the tests and may
help you identify the reason why your tests failed.

Best,

Peter



On 24 September 2014 17:22, Shantheya Balasupramaniam 
s.balasupraman...@tu-bs.de wrote:


  Dear Sir or Madam,

 i tried to install the RDKit on ubuntu 14.04 (64bit):

 first I installed the pre-requisites by:

 sudo apt-get install flex bison build-essential python-numpy cmake python-dev
 sqlite3 libsqlite3-dev libboost-dev  libboost-python-dev libboost-regex-
 dev

 afterwards I downloaded RDKit_2014_03_1.tgz, unzipped it and copied the
 RDKit_2014_03_1 file to /opt

 then I edited the bash.bashrc in /etc by:

 export RDBASE=/opt/RDKit_2014_03_1
 export LD_LIBRARY_PATH=$RDBASE/lib:$LD_LIBRARY_PATH
 export PYTHONPATH=$RDBASE:$PYTHONPATH

 finally I compiled and installed RDKit by:

  cd $RDBASE
 mkdir build
 cd build
 cmake ..
 make # -j 4
  make install

 when I did the ctest as sudo I got this:

 Test project /opt/RDKit_2014_03_1/build
   Start  1: testDict
  1/78 Test  #1: testDict .   Passed0.03 sec
   Start  2: testDataStructs
  2/78 Test  #2: testDataStructs ..***Exception: Other
 0.19 sec
   Start  3: pyBV
  3/78 Test  #3: pyBV .***Failed0.01 sec
   Start  4: pyDiscreteValueVect
  4/78 Test  #4: pyDiscreteValueVect ..***Failed0.01 sec
   Start  5: pySparseIntVect
  5/78 Test  #5: pySparseIntVect ..***Failed0.01 sec
   Start  6: testTransforms
  6/78 Test  #6: testTransforms ...   Passed0.01 sec
   Start  7: testGrid
  7/78 Test  #7: testGrid .***Exception: Other
 0.19 sec
   Start  8: testPyGeometry
  8/78 Test  #8: testPyGeometry ...***Failed0.01 sec
   Start  9: testMatrices
  9/78 Test  #9: testMatrices .   Passed0.00 sec
   Start 10: testAlignment
 10/78 Test #10: testAlignment    Passed0.00 sec
   Start 11: pyAlignment
 11/78 Test #11: pyAlignment ..***Failed0.01 sec
   Start 12: testOptimizer
 12/78 Test #12: testOptimizer    Passed0.00 sec
   Start 13: testUFFForceField
 13/78 Test #13: testUFFForceField    Passed0.04 sec
   Start 14: testMMFFForceField
 14/78 Test #14: testMMFFForceField ...***Exception: Other
 0.20 sec
   Start 15: pyForceFieldConstraints
 15/78 Test #15: pyForceFieldConstraints ..***Failed0.01 sec
   Start 16: testDistGeom
 16/78 Test #16: testDistGeom .   Passed0.00 sec
   Start 17: pyDistGeom
 17/78 Test #17: pyDistGeom ...***Failed0.01 sec
   Start 18: graphmolTest1
 18/78 Test #18: graphmolTest1    Passed0.11 sec
   Start 19: graphmolcpTest
 19/78 Test #19: graphmolcpTest ...   Passed0.01 sec
   Start 20: graphmolqueryTest
 20/78 Test #20: graphmolqueryTest    Passed0.01 sec
   Start 21: graphmolMolOpsTest
 21/78 Test #21: graphmolMolOpsTest ...***Exception: SegFault
 0.31 sec
   Start 22: graphmoltestCanon
 22/78 Test #22: graphmoltestCanon    Passed0.00 sec
   Start 23: graphmoltestChirality
 23/78 Test #23: graphmoltestChirality ***Exception: Other
 0.22 sec
   Start 24: graphmoltestPickler
 24/78 Test #24: graphmoltestPickler ..***Exception: Other
 0.18 sec
   Start 25: graphmolIterTest
 25/78 Test #25: graphmolIterTest .   Passed0.01 sec
   Start 26: testDepictor
 26/78 Test #26: testDepictor .***Exception: Other
 0.18 sec
   Start 27: pyDepictor
 27/78 Test #27: pyDepictor ...***Failed0.01 sec
   Start 28: smiTest1
 28/78 Test #28: smiTest1 .   Passed0.21 sec
   Start 29: smaTest1
 29/78 Test #29: smaTest1 .   Passed0.88 sec
   Start 30: fileParsersTest1
 30/78 Test #30: fileParsersTest1 .***Exception: Other
 0.18 sec
   Start 31: testMolSupplier
 31/78 Test #31: testMolSupplier ..***Exception: Other
 0.18 sec
   Start 32: testMolWriter
 32/78 Test #32: testMolWriter ***Exception: Other
 0.18 sec
   Start 33: testTplParser
 33/78 Test #33: testTplParser ***Exception: Other
 0.18 sec
   Start 34: testMol2ToMol
 34/78 Test #34: testMol2ToMol ***Exception: Other
 0.19 sec
   Start 35: testSubstructMatch
 35/78 Test #35: testSubstructMatch ...   Passed0.01 sec
   Start 36: testReaction
 36/78 Test #36: testReaction .***Exception: Other
 0.27 sec
   Start 37: pyChemReactions
 37/78 

Re: [Rdkit-discuss] MMFFGetMoleculeProperties()

2014-05-05 Thread Peter Gedeck
Hello,

I searched through the source code for MMFFGetMoleculeProperties and found
a few test files. The method MMFFGetMoleculeProperties is part of the
ChemicalForceFields module:

from rdkit.Chem import ChemicalForceFields

  def testMMFFAngleConstraints(self) :
m = Chem.MolFromMolBlock(self.molB, True, False)
mp = ChemicalForceFields.MMFFGetMoleculeProperties(m)


So adapt your code as above and it should work (not tried, but the tests
work).

Best,

Peter



On 5 May 2014 21:26, casyo...@zedat.fu-berlin.de wrote:

 Hi to all list's members!

 I want to use the method MMFFGetMoleculeProperties()( here a link to docu

 http://www.rdkit.org/Python_Docs/rdkit.Chem.rdForceFieldHelpers-module.html#MMFFGetMoleculeProperties
 ).

 Here my very simple rdkit code:
 from rdkit import Chem
 from rdkit.Chem import AllChem
 from sys import argv

 suppl = Chem.SDMolSupplier(somepath)

 for mol in suppl:
 mp = AllChem.MMFFGetMoleculeProperties(mol)

 I always receive the following error:
 Traceback (most recent call last):
   File forcefield.py, line 8, in module
 mp = AllChem.MMFFGetMoleculeProperties(mol)
 AttributeError: 'module' object has no attribute
 'MMFFGetMoleculeProperties'

 I have tried to import another modules like ChemicalForceFields  and with
 calling the same line
 mp=ChemicalForceFields.MMFFGetMoleculeProperties(mol)
 I got the same error.

 What is wrong about my code?

 Thank you all in advance!

 Bests,
 Ani



 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problem reading a specific smiles with the cartridge

2014-03-24 Thread Peter Gedeck
Hi

If we had known that the helpdesk advice would work here, ...   ;-)

Best,

Peter



On 24 March 2014 19:05, Gerebtzoff, Gregori gregori.gerebtz...@roche.comwrote:

 Hi guys,

 Many thanks for your help and suggestions!
 Don't ask me why but restarting PostgreSQL did the trick, now my C12CC(C1)C2
 smiles can be read correctly.
 = select mol_from_smiles('C12CC(C1)C2');
  mol_from_smiles
 -
  C1C2CC1C2
 (1 row)

 Maybe the DB was somehow corrupted, since I got subsequent warnings like
 null argument to internal routine.
 Sorry to have bothered you with that!

 Grégori



 On 23 March 2014 10:30, Greg Landrum greg.land...@gmail.com wrote:



 On Saturday, March 22, 2014, Gerebtzoff, Gregori 
 gregori.gerebtz...@roche.com wrote:

 Hi Greg,

 It's just that particular smiles, I don't have any problem reading
 thousands of other smiles and loading them in the cartridge.
 Which version of the cartridge do you use?


 I was just testing against the svn version.
 I don't recall having made any modifications in the SMILES parser that
 would lead to this behavior, but obviously something is going on.

 Peter's suggestion to try another form of the same SMILES (to check if
 it's the molecule and not the SMILES) is a very good one.

 -greg





 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and their
 applications. Written by three acclaimed leaders in the field,
 this first edition is now available. Download your free book today!
 http://p.sf.net/sfu/13534_NeoTech
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problem reading a specific smiles with the cartridge

2014-03-22 Thread Peter Gedeck
Hello


Can you construct similar SMILES like





 C12(C1)C2

 N12CCC(C1)C2

C1(CCC2)CCC12




Are other smiles in your dataset  bicyclic systems? 




Does it work with the rewritten smiles

C1C2CC1C2?




Best




Peter








On Sat, Mar 22, 2014 at 7:42 pm, Gerebtzoff, Gregori 
gregori.gerebtz...@roche.com=mailto:gregori.gerebtz...@roche.com; wrote:

Hi Greg,




It's just that particular smiles, I don't have any problem reading thousands of 
other smiles and loading them in the cartridge.

Which version of the cartridge do you use?




Gregori

On Saturday, March 22, 2014, Greg Landrum greg.land...@gmail.com wrote:

Hi Grégori,


It doesn't seem to be a problem with the cartridge itself:





chembl_16=# select mol_from_smiles('C12CC(C1)C2');

 mol_from_smiles 



-

 C1C2CC1C2

(1 row)






I can also use it from psycopg2 without problems.




Can you read other SMILES or is it just that one that's problematic?






-greg






On Fri, Mar 21, 2014 at 6:30 PM, Gerebtzoff, Gregori 
gregori.gerebtz...@roche.com wrote:


Hi guys,




I've been having problem reading this particular smiles string with the 
PostgreSQL cartridge: C12CC(C1)C2



I don't know if I'm running the latest version of the cartridge though...





Thanks for your help!



Grégori







 cursor.execute(select rdkit_version())

 cursor.fetchone()

['0.70.0']





 cursor.execute(select mol_from_smiles('C12CC(C1)C2'))

Traceback (most recent call last):

  File stdin, line 1, in module

  File /apps64/python/lib/python2.7/site-packages/psycopg2/extras.py, line 
122, in execute





    return _cursor.execute(self, query, vars)




 import rdkit

 from rdkit import Chem, rdBase


 rdBase.rdkitVersion

'2013.09.2'





 mol = Chem.MolFromSmiles('C12CC(C1)C2')




rdkit.Chem.rdchem.Mol object at 0x1ebd7a60



 Chem.MolToSmiles(mol)


'C1C2CC1C2'











--

Learn Graph Databases - Download FREE O'Reilly Book

Graph Databases is the definitive new guide to graph databases and their

applications. Written by three acclaimed leaders in the field,

this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___

Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] One flavour of mcss

2012-12-11 Thread Peter Gedeck
Hello

 given a data set of let's say 2000 compounds,
  how do I extract the most
 common substructures rather than the
 maximum common substructures?
 In addition, I would like to output the
 frequency of the found

One approach would be to take a brics decomposition where you keep the full
decomposition hierarchy of a structure. You can then just count the
fragments in your data set to get the frequencies. As the brics decomp is
done in python it's not particularly fast (I mean interactive speed) but
for 2000 compounds it's ok.

The nice thing about the brics fragments is that chemists will like them.
I would terminate the decomposition at a fragment size of 3 to avoid
getting single atoms. check the arguments of the brics decomposition
function for ways to do this.

Best,

Peter
--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss