[Rdkit-discuss] https://en.wikipedia.org/wiki/Hansen_solubility_parameter
Dear all, I would like to know if you have an idea on how to determine the "real" fragment count in a molecule. I mean find one fragment with priority and remove it from the molecule and continue until the molecule was empty. the complex part is related to the proper enumaration of linear or branched alkaned substituants: iso_Bu, iso_Pr, ter_Bu, 2_Bu, CH2, CH2CH2, CH2CH2CH2, CH2CH2CH2CH2, CH3, CH3, Et, Pr, Bu here few examples: Pentylamine, CN => CH2:1 & Bu:1 & NH2:1 Isopropyl Palmitate, (=O)OC(C)C => Bu:1 & iso_Pr:1 & CH2CH2CH2:1 & COO:1 & CH2CH2CH2CH2:2 Di-2-Ethylhexyl Ether, C(CC)COCC(CC) => CH2:2 & CH:2 & Bu:2 & Et:2 & O:1 ?any idea ? Dr. Guillaume GODIN Principal Scientist Chemoinformatic & Datamining Innovation CORPORATE R DIVISION DIRECT LINE +41 (0)22 780 3645 MOBILE +41 (0)79 536 1039 Firmenich SA RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8 ** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. **-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] https://en.wikipedia.org/wiki/Hansen_solubility_parameter
Hi Dr. Guillaume, I played around with the ability to map a set of fragments to molecules a couple months ago. The result of my experiments are here: https://github.com/coleb/fragment_mapper You give it a set of molecules and fragments you would like to have mapped. It tries to find the smallest set of fragments by trying the largest first using a greedy algorithm. Does fairly well at finding the largest alkyl chain to satisfy parts of the molecule. But is entirely dependent on what fragments are in the input set. I was interested in using this to determine how well fragment collections cover sets of molecules. The scripts will output reports of what fragments are mapped (or conversely, what is missing). Attaching example PDFs of that. Let me know if you find it useful. The major drawbacks I've noticed in my experimenting is that it gets tricked up be tautomer changes from the fragment to the molecule (been playing with a way to work around that by trying out what Roger presented at the UGM). Also, it doesn't check the bond orders between the fragments, which matters for my use case, but doesn't look like it does for yours. Cheers, Brian On Thu, Dec 8, 2016 at 2:43 AM, Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Dear all, > > > I would like to know if you have an idea on how to determine the "real" > fragment count in a molecule. I mean find one fragment with priority and > remove it from the molecule and continue until the molecule was empty. > > > the complex part is related to the proper enumaration of linear or > branched alkaned substituants: > > > iso_Bu, iso_Pr, ter_Bu, 2_Bu, CH2, CH2CH2, CH2CH2CH2, CH2CH2CH2CH2, CH3, > CH3, Et, Pr, Bu > > here few examples: > > Pentylamine, CN => CH2:1 & Bu:1 & NH2:1 > > Isopropyl Palmitate, (=O)OC(C)C => Bu:1 & iso_Pr:1 & > CH2CH2CH2:1 & COO:1 & CH2CH2CH2CH2:2 > > Di-2-Ethylhexyl Ether, C(CC)COCC(CC) => CH2:2 & CH:2 & Bu:2 > & Et:2 & O:1 > > > any idea ? > > *Dr. Guillaume GODIN* > Principal Scientist > Chemoinformatic & Datamining > Innovation > CORPORATE R DIVISION > DIRECT LINE +41 (0)22 780 3645 <+41%2022%20780%2036%2045> > MOBILE +41 (0)79 536 1039 <+41%2079%20536%2010%2039> > Firmenich SA > RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8 > > > ** > DISCLAIMER > This email and any files transmitted with it, including replies and > forwarded copies (which may contain alterations) subsequently transmitted > from Firmenich, are confidential and solely for the use of the intended > recipient. The contents do not represent the opinion of Firmenich except to > the extent that it relates to their official business. > ** > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today.http://sdm.link/xeonphi > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > MappingNotFound.pdf Description: Adobe PDF document NotFullyCovered.pdf Description: Adobe PDF document Success.pdf Description: Adobe PDF document -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Generating all stereochem possibilities from smile
Hello all, I am trying to generate R and S from: CCC(C)(Cl)Br Below is the code for making the smi to mol file. Can someone give me some guidance to generate all sterochem possibilities? The code would also need to work for 2 stereocenters such as: RR, RS, SR, SS or RE, RZ, SE, SZ etc. Thanks! Python Code: from rdkit import Chem from rdkit.Chem import AllChem smi = "CCC(C)(Cl)Br" uncharged_mol_1D = Chem.MolFromSmiles(smi) uncharged_mol_1D = Chem.MolFromSmiles(smi) uncharged_mol_3D = Chem.AddHs(uncharged_mol_1D) AllChem.EmbedMolecule(uncharged_mol_3D) AllChem.UFFOptimizeMolecule(uncharged_mol_3D) Chem.MolToMolFile(uncharged_mol_3D, "./test.mol") -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Handling SDF with 'aromatic' bonds?
First the thing I always have to say: According to the spec for mol blocks, aromatic bond orders are only supposed to be used for queries. Given the number of bogus mol files out there in the wild, the RDKit does actually still read these: In [49]: print(mb) RDKit 2D 6 6 0 0 0 0 0 0 0 0999 V2000 1.50000.0. C 0 0 0 0 0 0 0 0 0 0 0 0 0.7500 -1.29900. C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7500 -1.29900. C 0 0 0 0 0 0 0 0 0 0 0 0 -1.50000.0. C 0 0 0 0 0 0 0 0 0 0 0 0 -0.75001.29900. C 0 0 0 0 0 0 0 0 0 0 0 0 0.75001.29900. C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 2 3 4 0 3 4 4 0 4 5 4 0 5 6 4 0 6 1 4 0 M END In [50]: nm = Chem.MolFromMolBlock(mb) In [51]: Chem.MolToSmiles(nm) Out[51]: 'c1c1' It sounds like the problem you are having is analogous to this one: In [55]: print(mb) RDKit 5 5 0 0 0 0 0 0 0 0999 V2000 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 0.0.0. N 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 2 3 4 0 3 4 4 0 4 5 4 0 5 1 4 0 M END In [56]: nm = Chem.MolFromMolBlock(mb) [04:56:04] Can't kekulize mol. Unkekulized atoms: 0 1 2 3 4 This is the same problem that the RDKit has processing the (bogus) SMILES 'c1cccn1' for pyrrole: the missing H specification causes problems. Same thing with the (again bogus) SMILES for tetrazole that you provide. There is no code in the RDKit to try and guess what the user means with these poorly specified molecules. There have been discussions about this in the past on the mailing list and there are some links to those (but, strangely, no code) in the cookbook: http://www.rdkit.org/docs/Cookbook.html#cleaning-up-heterocycles That's probably a good place to start. -greg On Thu, Dec 8, 2016 at 5:36 PM, Brian Colewrote: > Any advice on getting RDKit to read in SDF files that use bond order '4' > to mark bonds as aromatic and don't have explicit hydrogen? For example, > imagine two fused heterocycles where the hydrogen isn't really known. I > have SDF files that just mark the bond orders as '4', aromatic, and don't > even try to specify which tautomer it wants to represent. > > Does this enter the same category as OpenBabel considering c11 to be > tetrazole and not specifying where the hydrogen is? > > Any tips for getting RDKit to input these structures and clean them up? > > Thanks, > Brian > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today.http://sdm.link/xeonphi > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss