Re: [Rdkit-discuss] Explicit valence error when reading sdf files
On 11 July 2014 23:41, Wendy Carande wcara...@gmail.com wrote: 10104489 TRC 05231419153D PM6 optimization, min free energy conformation 14 14 0 0 0 0 0 0 0 0999 V2000 -0.43072.08890.2792 H 0 0 0 0 0 0 0 0 0 0 0 0 0.04071.10710.2148 C 0 0 0 0 0 0 0 0 0 0 0 0 1.40080.94840.5227 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.6973 -0.0195 -0.1759 C 0 0 0 0 0 0 0 0 0 0 0 0 1.99411.81220.8291 H 0 0 0 0 0 0 0 0 0 0 0 0 1.9923 -0.31340.4365 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.1378 -1.2635 -0.2668 N 0 0 0 0 0 0 0 0 0 0 0 0 -2.17100.0301 -0.5321 C 0 0 0 0 0 0 0 0 0 0 0 0 3.0439 -0.47530.6673 H 0 0 0 0 0 0 0 0 0 0 0 0 1.1631 -1.37240.0355 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.87660.56890.4954 F 0 0 0 0 0 0 0 0 0 0 0 0 -2.37750.9405 -1.5182 F 0 0 0 0 0 0 0 0 0 0 0 0 -2.6216 -0.9493 -0.8245 H 0 0 0 0 0 0 0 0 0 0 0 0 1.6684 -3.1599 -0.1690 Br 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 2 3 1 0 0 0 0 3 5 1 0 0 0 0 4 2 1 0 0 0 0 6 3 2 0 0 0 0 6 9 1 0 0 0 0 7 4 2 0 0 0 0 7 10 2 0 0 0 0 8 4 1 0 0 0 0 8 11 1 0 0 0 0 10 6 1 0 0 0 0 12 8 1 0 0 0 0 13 8 1 0 0 0 0 14 10 1 0 0 0 0 M RAD 1 2 2 M END This is not a problem with RDKit, but a chemistry problem. Your structure has a tetra valent N (you have an uncharged nitrogen atom in the ring with 4 bonds in your structure). If you add a + charge to the nitrogen (M CHG line in the sdf, see below), RDKit is able to read in your structure. You can easily do this using a free program such as MarvinSketch (it also shows you where your original error is). [image: Inline images 1] --- PYTHON CODE import rdkit from rdkit import Chem s = Chem.SDMolSupplier(/tmp/test_fixed.sdf') s.next() rdkit.Chem.rdchem.Mol object at 0x7fe783d21360 --- FIXED SDF FILE Mrv0541 07121410173D -76.23192 PM6 optimization, min free energy conformation 14 14 0 0 0 0999 V2000 -0.43072.08890.2792 H 0 0 0 0 0 0 0 0 0 0 0 0 0.04071.10710.2148 C 0 0 0 0 0 0 0 0 0 0 0 0 1.40080.94840.5227 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.6973 -0.0195 -0.1759 C 0 0 0 0 0 0 0 0 0 0 0 0 1.99411.81220.8291 H 0 0 0 0 0 0 0 0 0 0 0 0 1.9923 -0.31340.4365 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.1378 -1.2635 -0.2668 N 0 3 0 0 0 0 0 0 0 0 0 0 -2.17100.0301 -0.5321 C 0 0 1 0 0 0 0 0 0 0 0 0 3.0439 -0.47530.6673 H 0 0 0 0 0 0 0 0 0 0 0 0 1.1631 -1.37240.0355 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.87660.56890.4954 F 0 0 0 0 0 0 0 0 0 0 0 0 -2.37750.9405 -1.5182 F 0 0 0 0 0 0 0 0 0 0 0 0 -2.6216 -0.9493 -0.8245 H 0 0 0 0 0 0 0 0 0 0 0 0 1.6684 -3.1599 -0.1690 Br 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 2 3 1 0 0 0 0 3 5 1 0 0 0 0 4 2 1 0 0 0 0 6 3 2 0 0 0 0 6 9 1 0 0 0 0 7 4 2 0 0 0 0 7 10 2 0 0 0 0 8 4 1 0 0 0 0 8 11 1 0 0 0 0 10 6 1 0 0 0 0 12 8 1 0 0 0 0 13 8 1 0 0 0 0 14 10 1 0 0 0 0 M CHG 1 7 1 M RAD 1 2 2 M END Symmetry Cs(1) FreeEnergy -105.218958525368 Freq 36.3827 1.8490 118.9528 0.6797 121.3287 1.7880 146.7422 3.2258 230.2547 4.6726 300.5702 5.7138 328.8019 1.4348 361.6117 0.5034 402.5823 0.1183 552.4995 20.1778 573.7578 0.7088 621.4207 13.4753 682.9353 27.8339 701.6618 5.4059 844.3396 76.4557 881.5112 51.8745 935.0009 0.0366 986.5135 4.6250 1020.1213 12.4436 1073.2578 27.3170 1132.0055 17.4835 1149.0508 5.7188 1174.1069 3.8183 1193.8903 14.3170 1225.8361 3.1755 1250.0146 94.6662 1258.0689 25.6122 1333.4544 115.3666 1444.9060 96.2140 1474.6392 0.2878 1604.5610 58.4422 1630.9742 34.4613 2636.6222 77.6239 2737.5860 21.2480 2745.9489 233.0648 2756.3587 214.6328 gAAFreq 1046.3408417 gAlpha 72.95 gCOSMO_DPSA1 -16.3678412270001 gCOSMO_DPSA2 -8.0618952540101 gCOSMO_NCD -0.00499623410644148 gCOSMO_NEG -0.492544809359508 gCOSMO_PCD 0.00599090900680561 gCOSMO_PNSA1 98.5832126490001 gCOSMO_PNSA2 -48.5566496802496 gCOSMO_POS 0.492544809149928 gCOSMO_PPSA1 82.215371422 gCOSMO_PPSA2 40.4947544262395 gCOSMO_SA 180.798584071 gCOSMO_SKW 0.54643002523 gCOSMO_VAR 0.00692393043054713 gCOSMO_Vol 167.141981607954 gCvib 12.6954699037459 gDPSA1 -65.71527629439 gDPSA2 -76.6311751243748 gDipole 2.8934 gEN -5.447286492133e+02
[Rdkit-discuss] Feature generation in postgres cartridge
Hi Greg, I was using postgres cartridge i found there are several implementations for chemical features. Some of them i tried like maccs, morganbv_fp i found they generate hexadecimal values. So when i convert hexadecimal to binary i found maccs has 168 values and for morganvbv_fp it has 512 binary values. I may be wrong in understading but just to make sure if i am correct or not. If i am correct then how can I generate 1024 binary values or it is restricted to 512? I found the binary values are different using two different radius which is what i expect. Can this binary values be extended to 1024 bits or so on. So if this is the case doesnt it cause error in similarity calculation ? using radius 4 chembl_18=#select morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1',4); \x104008800230218340002440c540250700100c4843840200400c000846208005008188a00082084802411e0820a481400860a80408404241000441006008 Using radius 6 chembl_18=# select morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1',6); \x104408800230218340003c42c540250700100c4843840200400c00084e20c005008188a000c2884802415e0820a481400862a8042842424102044100600c Thanks Abhik Abhik Seal Indiana University Bloomington School of Informatics and Computing Cheminformatics and Chemgenomics group http://registratio54.wix.com/ccrg abs...@indiana.edu http://mypage.iu.edu/~abseal/index.htm -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Feature generation in postgres cartridge
Hi Abhik, On Sat, Jul 12, 2014 at 9:38 PM, Abhik Seal abhik1...@gmail.com wrote: I was using postgres cartridge i found there are several implementations for chemical features. Some of them i tried like maccs, morganbv_fp i found they generate hexadecimal values. So when i convert hexadecimal to binary i found maccs has 168 values and for morganvbv_fp it has 512 binary values. The size of the MACCS fingerprint comes from its definition: there are a certain number of defined features that the code searches for. The Morgan fingerprints, on the other hand, have a variable size selectable by the user. The default value in the cartridge is, as you have discovered, 512 bits. I may be wrong in understading but just to make sure if i am correct or not. If i am correct then how can I generate 1024 binary values or it is restricted to 512? I found the binary values are different using two different radius which is what i expect. Can this binary values be extended to 1024 bits or so on. So if this is the case doesnt it cause error in similarity calculation ? You can change the size of the fingerprints using configuration variables: contrib_regression=# select morganbv_fp('c1c1C'); morganbv_fp \x0080020001844080020010002100 (1 row) contrib_regression=# set rdkit.morgan_fp_size=1024; SET contrib_regression=# select morganbv_fp('c1c1C'); morganbv_fp \x00800200018010002004408002000100 (1 row) The options available are: rdkit.dice_threshold rdkit.layered_fp_size rdkit.do_chiral_sssrdkit.morgan_fp_size rdkit.featmorgan_fp_size rdkit.rdkit_fp_size rdkit.hashed_atompair_fp_size rdkit.ss_fp_size rdkit.hashed_torsion_fp_size rdkit.tanimoto_threshold Note that a change to a configuration variable as done here only affects the current session. If you want to make it the default for the database as a whole you need to change the database configuration: contrib_regression=# alter database contrib_regression set rdkit.morgan_fp_size=1024; ALTER DATABASE Then disconnect (close psql) and reconnect to pick up the new setting. I hope this helps, -greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] valence problem
Hi Adrian, On Thu, Jul 10, 2014 at 12:42 PM, Adrian JasiĆski jasinski.adr...@gmail.com wrote: Hi all I have a problem with generating molecule from smiles. from rdkit import Chem template = Chem.MolFromSmiles('F[P-](F)(F)(F)(F)F.CN(C)C(F)=[N+](C)C') I got an error: Explicit valence for atom # 1 P, 7, is greater than permitted But the SMILES for this structure should be valid. I checked many web services and the structure is always the same the CAS number for this structure is 164298-23-1 The default RDKit behavior is to reject hypervalent P. This is probably something I should change given how frequently the PF6- anion occurs. Can I skip checking the valence during generating mol from smiles? You can, but you probably want to at least do a partial sanitization so that the molecule is actually useful: In [14]: m = Chem.MolFromSmiles('F[P-](F)(F)(F)(F)F.CN (C)C(F)=[N+](C)C',sanitize=False) In [15]: m.UpdatePropertyCache(strict=False) In [16]: Chem.SanitizeMol(m,Chem.SanitizeFlags.SANITIZE_FINDRADICALS|Chem.SanitizeFlags.SANITIZE_KEKULIZE|Chem.SanitizeFlags.SANITIZE_SETAROMATICITY|Chem.SanitizeFlags.SANITIZE_SETCONJUGATION|Chem.SanitizeFlags.SANITIZE_SETHYBRIDIZATION|Chem.SanitizeFlags.SANITIZE_SYMMRINGS,catchErrors=True) Out[16]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE -greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss