[Rdkit-discuss] Feature generation in postgres cartridge

2014-07-12 Thread Abhik Seal
Hi Greg,

I was using postgres cartridge i found there are several implementations
for chemical features. Some of them i tried like maccs, morganbv_fp i found
they generate hexadecimal values. So when i convert hexadecimal to binary i
found maccs has 168 values and for morganvbv_fp it has 512 binary values.

I may be wrong in understading but just to make sure if i am correct or
not. If i am correct then how can I generate 1024 binary values or it is
restricted to 512? I found the binary values are different using two
different radius which is what i expect. Can this binary values be extended
to 1024 bits or so on.  So if this is the case doesnt it cause error in
similarity calculation ?

using radius 4
chembl_18=#select
morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1',4);



 
\x104008800230218340002440c540250700100c4843840200400c000846208005008188a00082084802411e0820a481400860a80408404241000441006008

Using radius 6

chembl_18=# select
morganbv_fp('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1',6);



 
\x104408800230218340003c42c540250700100c4843840200400c00084e20c005008188a000c2884802415e0820a481400862a8042842424102044100600c


Thanks
Abhik

Abhik Seal
Indiana University Bloomington
School of Informatics and Computing
Cheminformatics and Chemgenomics group http://registratio54.wix.com/ccrg
abs...@indiana.edu
http://mypage.iu.edu/~abseal/index.htm
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Feature generation in postgres cartridge

2014-07-12 Thread Greg Landrum
Hi Abhik,

On Sat, Jul 12, 2014 at 9:38 PM, Abhik Seal abhik1...@gmail.com wrote:


 I was using postgres cartridge i found there are several implementations
 for chemical features. Some of them i tried like maccs, morganbv_fp i found
 they generate hexadecimal values. So when i convert hexadecimal to binary i
 found maccs has 168 values and for morganvbv_fp it has 512 binary values.


The size of the MACCS fingerprint comes from its definition: there are a
certain number of defined features that the code searches for.
The Morgan fingerprints, on the other hand, have a variable size selectable
by the user. The default value in the cartridge is, as you have discovered,
512 bits.


 I may be wrong in understading but just to make sure if i am correct or
 not. If i am correct then how can I generate 1024 binary values or it is
 restricted to 512? I found the binary values are different using two
 different radius which is what i expect. Can this binary values be extended
 to 1024 bits or so on.  So if this is the case doesnt it cause error in
 similarity calculation ?


 You can change the size of the fingerprints using configuration variables:

contrib_regression=# select morganbv_fp('c1c1C');
morganbv_fp


 
\x0080020001844080020010002100
(1 row)

contrib_regression=# set rdkit.morgan_fp_size=1024;
SET
contrib_regression=# select morganbv_fp('c1c1C');

morganbv_fp



 
\x00800200018010002004408002000100
(1 row)

The options available are:
rdkit.dice_threshold   rdkit.layered_fp_size
rdkit.do_chiral_sssrdkit.morgan_fp_size
rdkit.featmorgan_fp_size   rdkit.rdkit_fp_size
rdkit.hashed_atompair_fp_size  rdkit.ss_fp_size
rdkit.hashed_torsion_fp_size   rdkit.tanimoto_threshold


Note that a change to a configuration variable as done here only affects
the current session. If you want to make it the default for the database as
a whole you need to change the database configuration:

contrib_regression=# alter database contrib_regression set
rdkit.morgan_fp_size=1024;
ALTER DATABASE

Then disconnect (close psql) and reconnect to pick up the new setting.

I hope this helps,
-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss