Re: [Rdkit-discuss] mass replacement of External R-groups with many substituents

2017-03-16 Thread Greg Landrum
Combining Steve's and Chris' answer gets to how I would do it with reactions: In [17]: core = Chem.MolFromSmiles('*c1c(C)1(O)') In [18]: chain = Chem.MolFromSmiles('CN*') In [19]: rxn = AllChem.ReactionFromSmarts('[c:1][#0].[#0][*:2]>>[c:1]-[*:2]') In [20]: ps =

Re: [Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread Greg Landrum
I'm a bit confused by all this. The RDKit has Tanimoto (and a bunch of other similarity measures) built in: In [6]: from rdkit import DataStructs In [7]: fp1 = rdMolDescriptors.GetMorganFingerprintAsBitVect(theobromine,2,2048) In [8]: fp2 =

Re: [Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread matthew
I don't think you even need to cast them to numpy arrays if you use scipy. It should be able to take bit arrays. Also, jaccard distance is another name for tanimoto distance. This simplifies the code above: *from __future__ import print_function from rdkit import Chem* *from rdkit.Chem import

Re: [Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread Curt Fischer
If you are looking for something quick and dirty, you could stay in numpy to calculate Tanimoto. *from rdkit import Chem* *from rdkit.Chem import AllChem* *import numpy as np* *from __future__ import division* *mol1 = Chem.MolFromSmiles('CCO')* *mol2 = Chem.MolFromSmiles('CCC')* *fp1 =

Re: [Rdkit-discuss] mass replacement of External R-groups with many substituents

2017-03-16 Thread Stephen Roughley
You can match a dummy atom (*) with the SMARTS [#0] Steve On 16 Mar 2017 16:43, "Chris Earnshaw" wrote: > Hi Brian > > I'm by no means an expert in RDKit with Python, but until someone else > comes along, here are a few thoughts. > > Your reaction SMARTS specifically

Re: [Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread Francois BERENGER
Hi, Here is a Python script that was created with the help of some rdkit wizards: https://github.com/UnixJunkie/mol2ecfp4 It works with unfolded ECFP4 fingerprints, so not exactly what you are looking for. There would be more modifications needed in order to fold the fingerprint to the desired

Re: [Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread Francois BERENGER
I'll send a Python script. It works for .smi files. If anyone can adapt it to work on sdf files, that would be wonderful. Just give me 5mn to put it on github. On 03/16/2017 09:28 AM, Thomas Evangelidis wrote: > Hello, > > I created a numpyarray from a molecule using the following function: > >

Re: [Rdkit-discuss] mass replacement of External R-groups with many substituents

2017-03-16 Thread Chris Earnshaw
Hi Brian I'm by no means an expert in RDKit with Python, but until someone else comes along, here are a few thoughts. Your reaction SMARTS specifically defines aromatic carbons joined by single bonds which won't match an incoming benzene ring, and it's a bit redundant to specify that aromatic

[Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread Thomas Evangelidis
Hello, I created a numpyarray from a molecule using the following function: AllChem.GetMorganFingerprintAsBitVect() Now I would like to convert back to bit vector the numpy array, in order to calculate the Tanimoto similarity of two compounds. Is this possible? thanks Thomas --