Re: [Rdkit-discuss] Explicit valence error when reading sdf files

2014-07-12 Thread JP
On 11 July 2014 23:41, Wendy Carande wcara...@gmail.com wrote:

 10104489
   TRC 05231419153D
 PM6 optimization, min free energy conformation
  14 14  0  0  0  0  0  0  0  0999 V2000
-0.43072.08890.2792 H   0  0  0  0  0  0  0  0  0  0  0  0
 0.04071.10710.2148 C   0  0  0  0  0  0  0  0  0  0  0  0
 1.40080.94840.5227 C   0  0  0  0  0  0  0  0  0  0  0  0
-0.6973   -0.0195   -0.1759 C   0  0  0  0  0  0  0  0  0  0  0  0
 1.99411.81220.8291 H   0  0  0  0  0  0  0  0  0  0  0  0
 1.9923   -0.31340.4365 C   0  0  0  0  0  0  0  0  0  0  0  0
-0.1378   -1.2635   -0.2668 N   0  0  0  0  0  0  0  0  0  0  0  0
-2.17100.0301   -0.5321 C   0  0  0  0  0  0  0  0  0  0  0  0
 3.0439   -0.47530.6673 H   0  0  0  0  0  0  0  0  0  0  0  0
 1.1631   -1.37240.0355 C   0  0  0  0  0  0  0  0  0  0  0  0
-2.87660.56890.4954 F   0  0  0  0  0  0  0  0  0  0  0  0
-2.37750.9405   -1.5182 F   0  0  0  0  0  0  0  0  0  0  0  0
-2.6216   -0.9493   -0.8245 H   0  0  0  0  0  0  0  0  0  0  0  0
 1.6684   -3.1599   -0.1690 Br  0  0  0  0  0  0  0  0  0  0  0  0
   2  1  1  0  0  0  0
   2  3  1  0  0  0  0
   3  5  1  0  0  0  0
   4  2  1  0  0  0  0
   6  3  2  0  0  0  0
   6  9  1  0  0  0  0
   7  4  2  0  0  0  0
   7 10  2  0  0  0  0
   8  4  1  0  0  0  0
   8 11  1  0  0  0  0
  10  6  1  0  0  0  0
  12  8  1  0  0  0  0
  13  8  1  0  0  0  0
  14 10  1  0  0  0  0
 M  RAD  1   2   2
 M  END




This is not a problem with RDKit, but a chemistry problem.

Your structure has a tetra valent N (you have an uncharged nitrogen atom in
the ring with 4 bonds in your structure).  If you add a + charge to the
nitrogen (M CHG line in the sdf, see below), RDKit is able to read in your
structure.  You can easily do this using a free program such as
MarvinSketch (it also shows you where  your original error is).


[image: Inline images 1]

--- PYTHON CODE 

 import rdkit
 from rdkit import Chem
 s = Chem.SDMolSupplier(/tmp/test_fixed.sdf')
 s.next()
rdkit.Chem.rdchem.Mol object at 0x7fe783d21360



--- FIXED SDF FILE 


  Mrv0541 07121410173D -76.23192
PM6 optimization, min free energy conformation
 14 14  0  0  0  0999 V2000
   -0.43072.08890.2792 H   0  0  0  0  0  0  0  0  0  0  0  0
0.04071.10710.2148 C   0  0  0  0  0  0  0  0  0  0  0  0
1.40080.94840.5227 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6973   -0.0195   -0.1759 C   0  0  0  0  0  0  0  0  0  0  0  0
1.99411.81220.8291 H   0  0  0  0  0  0  0  0  0  0  0  0
1.9923   -0.31340.4365 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1378   -1.2635   -0.2668 N   0  3  0  0  0  0  0  0  0  0  0  0
   -2.17100.0301   -0.5321 C   0  0  1  0  0  0  0  0  0  0  0  0
3.0439   -0.47530.6673 H   0  0  0  0  0  0  0  0  0  0  0  0
1.1631   -1.37240.0355 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.87660.56890.4954 F   0  0  0  0  0  0  0  0  0  0  0  0
   -2.37750.9405   -1.5182 F   0  0  0  0  0  0  0  0  0  0  0  0
   -2.6216   -0.9493   -0.8245 H   0  0  0  0  0  0  0  0  0  0  0  0
1.6684   -3.1599   -0.1690 Br  0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  2  3  1  0  0  0  0
  3  5  1  0  0  0  0
  4  2  1  0  0  0  0
  6  3  2  0  0  0  0
  6  9  1  0  0  0  0
  7  4  2  0  0  0  0
  7 10  2  0  0  0  0
  8  4  1  0  0  0  0
  8 11  1  0  0  0  0
 10  6  1  0  0  0  0
 12  8  1  0  0  0  0
 13  8  1  0  0  0  0
 14 10  1  0  0  0  0
M  CHG  1   7   1
M  RAD  1   2   2
M  END
  Symmetry
Cs(1)

  FreeEnergy
-105.218958525368

  Freq
36.3827 1.8490
118.9528 0.6797
121.3287 1.7880
146.7422 3.2258
230.2547 4.6726
300.5702 5.7138
328.8019 1.4348
361.6117 0.5034
402.5823 0.1183
552.4995 20.1778
573.7578 0.7088
621.4207 13.4753
682.9353 27.8339
701.6618 5.4059
844.3396 76.4557
881.5112 51.8745
935.0009 0.0366
986.5135 4.6250
1020.1213 12.4436
1073.2578 27.3170
1132.0055 17.4835
1149.0508 5.7188
1174.1069 3.8183
1193.8903 14.3170
1225.8361 3.1755
1250.0146 94.6662
1258.0689 25.6122
1333.4544 115.3666
1444.9060 96.2140
1474.6392 0.2878
1604.5610 58.4422
1630.9742 34.4613
2636.6222 77.6239
2737.5860 21.2480
2745.9489 233.0648
2756.3587 214.6328

  gAAFreq
1046.3408417

  gAlpha
72.95

  gCOSMO_DPSA1
-16.3678412270001

  gCOSMO_DPSA2
-8.0618952540101

  gCOSMO_NCD
-0.00499623410644148

  gCOSMO_NEG
-0.492544809359508

  gCOSMO_PCD
0.00599090900680561

  gCOSMO_PNSA1
98.5832126490001

  gCOSMO_PNSA2
-48.5566496802496

  gCOSMO_POS
0.492544809149928

  gCOSMO_PPSA1
82.215371422

  gCOSMO_PPSA2
40.4947544262395

  gCOSMO_SA
180.798584071

  gCOSMO_SKW
0.54643002523

  gCOSMO_VAR
0.00692393043054713

  gCOSMO_Vol
167.141981607954

  gCvib
12.6954699037459

  gDPSA1
-65.71527629439

  gDPSA2
-76.6311751243748

  gDipole
2.8934

  gEN
-5.447286492133e+02

  

[Rdkit-discuss] Feature generation in postgres cartridge

2014-07-12 Thread Abhik Seal
Hi Greg,

I was using postgres cartridge i found there are several implementations
for chemical features. Some of them i tried like maccs, morganbv_fp i found
they generate hexadecimal values. So when i convert hexadecimal to binary i
found maccs has 168 values and for morganvbv_fp it has 512 binary values.

I may be wrong in understading but just to make sure if i am correct or
not. If i am correct then how can I generate 1024 binary values or it is
restricted to 512? I found the binary values are different using two
different radius which is what i expect. Can this binary values be extended
to 1024 bits or so on.  So if this is the case doesnt it cause error in
similarity calculation ?

using radius 4
chembl_18=#select
morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1',4);



 
\x104008800230218340002440c540250700100c4843840200400c000846208005008188a00082084802411e0820a481400860a80408404241000441006008

Using radius 6

chembl_18=# select
morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1',6);



 
\x104408800230218340003c42c540250700100c4843840200400c00084e20c005008188a000c2884802415e0820a481400862a8042842424102044100600c


Thanks
Abhik

Abhik Seal
Indiana University Bloomington
School of Informatics and Computing
Cheminformatics and Chemgenomics group http://registratio54.wix.com/ccrg
abs...@indiana.edu
http://mypage.iu.edu/~abseal/index.htm
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Feature generation in postgres cartridge

2014-07-12 Thread Greg Landrum
Hi Abhik,

On Sat, Jul 12, 2014 at 9:38 PM, Abhik Seal abhik1...@gmail.com wrote:


 I was using postgres cartridge i found there are several implementations
 for chemical features. Some of them i tried like maccs, morganbv_fp i found
 they generate hexadecimal values. So when i convert hexadecimal to binary i
 found maccs has 168 values and for morganvbv_fp it has 512 binary values.


The size of the MACCS fingerprint comes from its definition: there are a
certain number of defined features that the code searches for.
The Morgan fingerprints, on the other hand, have a variable size selectable
by the user. The default value in the cartridge is, as you have discovered,
512 bits.


 I may be wrong in understading but just to make sure if i am correct or
 not. If i am correct then how can I generate 1024 binary values or it is
 restricted to 512? I found the binary values are different using two
 different radius which is what i expect. Can this binary values be extended
 to 1024 bits or so on.  So if this is the case doesnt it cause error in
 similarity calculation ?


 You can change the size of the fingerprints using configuration variables:

contrib_regression=# select morganbv_fp('c1c1C');
morganbv_fp


 
\x0080020001844080020010002100
(1 row)

contrib_regression=# set rdkit.morgan_fp_size=1024;
SET
contrib_regression=# select morganbv_fp('c1c1C');

morganbv_fp



 
\x00800200018010002004408002000100
(1 row)

The options available are:
rdkit.dice_threshold   rdkit.layered_fp_size
rdkit.do_chiral_sssrdkit.morgan_fp_size
rdkit.featmorgan_fp_size   rdkit.rdkit_fp_size
rdkit.hashed_atompair_fp_size  rdkit.ss_fp_size
rdkit.hashed_torsion_fp_size   rdkit.tanimoto_threshold


Note that a change to a configuration variable as done here only affects
the current session. If you want to make it the default for the database as
a whole you need to change the database configuration:

contrib_regression=# alter database contrib_regression set
rdkit.morgan_fp_size=1024;
ALTER DATABASE

Then disconnect (close psql) and reconnect to pick up the new setting.

I hope this helps,
-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] valence problem

2014-07-12 Thread Greg Landrum
Hi Adrian,

On Thu, Jul 10, 2014 at 12:42 PM, Adrian JasiƄski jasinski.adr...@gmail.com
 wrote:

 Hi all
 I have a problem with generating molecule from smiles.

 from rdkit import Chem
 template = Chem.MolFromSmiles('F[P-](F)(F)(F)(F)F.CN(C)C(F)=[N+](C)C')

 I got an error:
  Explicit valence for atom # 1 P, 7, is greater than permitted

 But the SMILES for this structure should be valid.
 I checked many web services and the structure is always the same
 the CAS number for this structure is 164298-23-1


The default RDKit behavior is to reject hypervalent P. This is probably
something I should change given how frequently the PF6- anion occurs.


 Can I skip checking the valence during generating mol from smiles?


You can, but you probably want to at least do a partial sanitization so
that the molecule is actually useful:

In [14]: m = Chem.MolFromSmiles('F[P-](F)(F)(F)(F)F.CN
(C)C(F)=[N+](C)C',sanitize=False)

In [15]: m.UpdatePropertyCache(strict=False)

In [16]:
Chem.SanitizeMol(m,Chem.SanitizeFlags.SANITIZE_FINDRADICALS|Chem.SanitizeFlags.SANITIZE_KEKULIZE|Chem.SanitizeFlags.SANITIZE_SETAROMATICITY|Chem.SanitizeFlags.SANITIZE_SETCONJUGATION|Chem.SanitizeFlags.SANITIZE_SETHYBRIDIZATION|Chem.SanitizeFlags.SANITIZE_SYMMRINGS,catchErrors=True)
Out[16]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss