[Rdkit-discuss] https://en.wikipedia.org/wiki/Hansen_solubility_parameter

2016-12-08 Thread Guillaume GODIN
Dear all,


I would like to know if you have an idea on how to determine the "real" 
fragment count in a molecule. I mean find one fragment with priority and remove 
it from the molecule and continue until the molecule was empty.


the complex part is related to the proper enumaration of linear or branched 
alkaned substituants:


iso_Bu, iso_Pr, ter_Bu, 2_Bu, CH2, CH2CH2, CH2CH2CH2, CH2CH2CH2CH2, CH3, CH3, 
Et, Pr, Bu


here few examples:

Pentylamine, CN => CH2:1 & Bu:1 & NH2:1

Isopropyl Palmitate, (=O)OC(C)C => Bu:1 & iso_Pr:1 & 
CH2CH2CH2:1 & COO:1 & CH2CH2CH2CH2:2

Di-2-Ethylhexyl Ether, C(CC)COCC(CC) => CH2:2 & CH:2 & Bu:2 & Et:2 & O:1


?any idea ?

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
**  
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  
**--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] https://en.wikipedia.org/wiki/Hansen_solubility_parameter

2016-12-08 Thread Brian Cole
Hi Dr. Guillaume,

I played around with the ability to map a set of fragments to molecules a
couple months ago. The result of my experiments are here:
https://github.com/coleb/fragment_mapper

You give it a set of molecules and fragments you would like to have mapped.
It tries to find the smallest set of fragments by trying the largest first
using a greedy algorithm. Does fairly well at finding the largest alkyl
chain to satisfy parts of the molecule. But is entirely dependent on what
fragments are in the input set. I was interested in using this to determine
how well fragment collections cover sets of molecules.

The scripts will output reports of what fragments are mapped (or
conversely, what is missing). Attaching example PDFs of that.

Let me know if you find it useful. The major drawbacks I've noticed in my
experimenting is that it gets tricked up be tautomer changes from the
fragment to the molecule (been playing with a way to work around that by
trying out what Roger presented at the UGM). Also, it doesn't check the
bond orders between the fragments, which matters for my use case, but
doesn't look like it does for yours.

Cheers,
Brian

On Thu, Dec 8, 2016 at 2:43 AM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Dear all,
>
>
> I would like to know if you have an idea on how to determine the "real"
> fragment count in a molecule. I mean find one fragment with priority and
> remove it from the molecule and continue until the molecule was empty.
>
>
> the complex part is related to the proper enumaration of linear or
> branched alkaned substituants:
>
>
> iso_Bu, iso_Pr, ter_Bu, 2_Bu, CH2, CH2CH2, CH2CH2CH2, CH2CH2CH2CH2, CH3,
> CH3, Et, Pr, Bu
>
> here few examples:
>
> Pentylamine, CN => CH2:1 & Bu:1 & NH2:1
>
> Isopropyl Palmitate, (=O)OC(C)C => Bu:1 & iso_Pr:1 &
> CH2CH2CH2:1 & COO:1 & CH2CH2CH2CH2:2
>
> Di-2-Ethylhexyl Ether, C(CC)COCC(CC) => CH2:2 & CH:2 & Bu:2
> & Et:2 & O:1
>
>
> ​any idea ?
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <+41%2022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <+41%2079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
>
> **
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> **
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


MappingNotFound.pdf
Description: Adobe PDF document


NotFullyCovered.pdf
Description: Adobe PDF document


Success.pdf
Description: Adobe PDF document
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Generating all stereochem possibilities from smile

2016-12-08 Thread James Johnson
Hello all, I am trying to generate R and S from: CCC(C)(Cl)Br

Below is the code for making the smi to mol file. Can someone give me some
guidance to generate all sterochem possibilities?

The code would also need to work for 2 stereocenters such as:
RR, RS, SR, SS
or
RE, RZ, SE, SZ
etc.

Thanks!

Python Code:

from rdkit import Chem
from rdkit.Chem import AllChem

smi = "CCC(C)(Cl)Br"
uncharged_mol_1D = Chem.MolFromSmiles(smi)
uncharged_mol_1D = Chem.MolFromSmiles(smi)
uncharged_mol_3D = Chem.AddHs(uncharged_mol_1D)
AllChem.EmbedMolecule(uncharged_mol_3D)
AllChem.UFFOptimizeMolecule(uncharged_mol_3D)
Chem.MolToMolFile(uncharged_mol_3D, "./test.mol")
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Handling SDF with 'aromatic' bonds?

2016-12-08 Thread Greg Landrum
First the thing I always have to say:
According to the spec for mol blocks, aromatic bond orders are only
supposed to be used for queries.

Given the number of bogus mol files out there in the wild, the RDKit does
actually still read these:

In [49]: print(mb)

 RDKit  2D

  6  6  0  0  0  0  0  0  0  0999 V2000
1.50000.0. C   0  0  0  0  0  0  0  0  0  0  0  0
0.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.50000.0. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.75001.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
0.75001.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0
  2  3  4  0
  3  4  4  0
  4  5  4  0
  5  6  4  0
  6  1  4  0
M  END


In [50]: nm = Chem.MolFromMolBlock(mb)

In [51]: Chem.MolToSmiles(nm)
Out[51]: 'c1c1'


It sounds like the problem you are having is analogous to this one:

In [55]: print(mb)

 RDKit

  5  5  0  0  0  0  0  0  0  0999 V2000
0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
0.0.0. N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0
  2  3  4  0
  3  4  4  0
  4  5  4  0
  5  1  4  0
M  END


In [56]: nm = Chem.MolFromMolBlock(mb)
[04:56:04] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4


This is the same problem that the RDKit has processing the (bogus) SMILES
'c1cccn1' for pyrrole: the missing H specification causes problems. Same
thing with the (again bogus) SMILES for tetrazole that you provide.
There is no code in the RDKit to try and guess what the user means with
these poorly specified molecules.
There have been discussions about this in the past on the mailing list and
there are some links to those (but, strangely, no code) in the cookbook:
http://www.rdkit.org/docs/Cookbook.html#cleaning-up-heterocycles
That's probably a good place to start.

-greg






On Thu, Dec 8, 2016 at 5:36 PM, Brian Cole  wrote:

> Any advice on getting RDKit to read in SDF files that use bond order '4'
> to mark bonds as aromatic and don't have explicit hydrogen? For example,
> imagine two fused heterocycles where the hydrogen isn't really known. I
> have SDF files that just mark the bond orders as '4', aromatic, and don't
> even try to specify which tautomer it wants to represent.
>
> Does this enter the same category as OpenBabel considering c11 to be
> tetrazole and not specifying where the hydrogen is?
>
> Any tips for getting RDKit to input these structures and clean them up?
>
> Thanks,
> Brian
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss