This blogpost links to two other ones that may have done that (haven’t read
them carefully):
https://baoilleach.blogspot.com/2018/06/cheminformatics-for-deep-learners.html
Best regards, Jan
On 06 Aug 2018, at 11:57, Guillaume GODIN
<guillaume.go...@firmenich.com<mailto:guillaume.go...@firmenich.com>> wrote:
Dear Greg,
Fantastic, thank you to give both explanation and solution to this “simple
question”, I know this is not so simple & it’s fundamental for data
augmentation in deep learning.
If I may, I have another question related, do you know if someone has worked on
a generator of all unique smiles independently of RDKit ?
Thanks again,
Guillaume
De : Greg Landrum <greg.land...@gmail.com<mailto:greg.land...@gmail.com>>
Date : lundi, 6 août 2018 à 11:40
À : Guillaume GODIN
<guillaume.go...@firmenich.com<mailto:guillaume.go...@firmenich.com>>
Cc : RDKit Discuss
<rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>>
Objet : Re: [Rdkit-discuss] enumeration of smiles question
On Thu, Aug 2, 2018 at 8:59 AM Guillaume GODIN
<guillaume.go...@firmenich.com<mailto:guillaume.go...@firmenich.com>> wrote:
I have a simple question about generating all possible smiles of a given
molecule:
It's a simple question, but the answer is somewhat complicated. :-)
RDKit provides only 4 differents smiles for my molecule “CCC1CC1“:
C1C(CC)C1
CCC1CC1
C1(CC)CC1
C(C)C1CC1
While by hand we can write those 7 smiles:
CCC1CC1
C(C)C1CC1
C(C1CC1)C
C1CC(CC)1
C1C(CC)C1
C1CC1CC
C(CC)1CC1
I use this function for the enumeration:
def allsmiles(smil):
m = Chem.MolFromSmiles(smil) # Construct a molecule from a SMILES string.
if m is None:
return smil
N = m.GetNumAtoms()
if N==0:
return smil
try:
n= np.random.randint(0,high=N)
t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
except :
return smil
return t
n= 50
SMILES = [“CCC1CC1”]
SMILES_mult = [allsmiles(S) for S in SMILES for i in range(n)]
Why we cannot generate all the 7 smiles ?
The RDKit has rules that it uses to decide which atom to branch to when
generating a SMILES. These are used regardless of whether you are generating
canonical SMILES or not.
The upshot of this is that it will never generate a SMILES where there's a
branch before a ring closure.
The other important factor here is that atom rank is determined by the index of
the atom in the molecule when you aren't using canonicalization. So changing
the atom order on input can help:
In [12]: set(allsmiles('CCC1CC1') for i in range(50))
Out[12]: {'C(C)C1CC1', 'C1(CC)CC1', 'C1C(CC)C1', 'CCC1CC1'}
In [13]: set(allsmiles('C1CC1CC') for i in range(50))
Out[13]: {'C(C1CC1)C', 'C1(CC)CC1', 'C1CC1CC', 'CCC1CC1'}
You can do this all at once as follows:
```
In [20]: def allsmiles(smil):
...: m = Chem.MolFromSmiles(smil) # Construct a molecule from a SMILES
string.
...: if m is None:
...: return smil
...: N = m.GetNumAtoms()
...: if N==0:
...: return smil
...: aids = list(range(N))
...: random.shuffle(aids)
...: m = Chem.RenumberAtoms(m,aids)
...: try:
...: n= random.randint(0,N-1)
...: t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
...: except :
...: return smil
...: return t
...:
...:
...:
In [21]:
In [21]: set(allsmiles('C1CC1CC') for i in range(50))
Out[21]: {'C(C)C1CC1', 'C(C1CC1)C', 'C1(CC)CC1', 'C1C(CC)C1', 'C1CC1CC',
'CCC1CC1'}
```
Note that I switched to using python's built in random module instead of using
the one in numpy.
-greg
Thanks guys,
Best regards,
Guillaume
***********************************************************************************
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded
copies (which may contain alterations) subsequently transmitted from Firmenich,
are confidential and solely for the use of the intended recipient. The contents
do not represent the opinion of Firmenich except to the extent that it relates
to their official business.
***********************************************************************************
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>!
http://sdm.link/slashdot_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
***********************************************************************************
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded
copies (which may contain alterations) subsequently transmitted from Firmenich,
are confidential and solely for the use of the intended recipient. The contents
do not represent the opinion of Firmenich except to the extent that it relates
to their official business.
***********************************************************************************
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>!
http://sdm.link/slashdot_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss