Hi all,

I have been using the SubstructureFingerprinter form CDK to generate
fingerprints for a project. I first tried using my own substructures for
fingerprint generation and realized that the bits set to true are just a
subset of the correct list of on-bits, i.e., certain bits are incorrectly
set to true.
I used the default smarts array from CDK as in the following code:

IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();

SmilesParser sp = new SmilesParser(builder);

IAtomContainer mol =  sp.parseSmiles(
"CN(C)c1cc(OS(O)(O)=O)nc(n1)-c1cncc(c1)C(O)=O");

SubstructureFingerprinter sf = new SubstructureFingerprinter();

IBitFingerprint fingerprint = sf.getBitFingerprint(mol);

for(int k = 0; k < fingerprint.getSetbits().length; k++) {

System.out.println(fingerprint.getSetbits()[k] + " - " + sf.getSubstructure(
fingerprint.getSetbits()[k]));

}

This returns the following result:

83 - [CX3;$([R0][#6]),$([H1R0])](=[OX1])[$([OX2H]),$([OX1-])]

87 - [$([#6X3H0][#6]),$([#6X3H])](=[!#6])[!#6]

120 -
[#6][#6X3R;$([H0](=[NX2;!$(N(=[#6X3][#7X3])C=[O,S])])[#7X3;!$(N([#6X3]=[#7X2])C=[O,S])]),$([H0](-[NX3;!$(N([#6X3]=[#7X2])C=[O,S])])=,:[#7X2;!$(N(=[#6X3][#7X3])C=[O,S])])]

134 - [#6X3](=[OX1])[#6X3]=,:[#6X3][#7,#8,#16,F,Cl,Br,I]

136 - [#6X3](=[OX1])[#6X3]=,:[#6X3][#6;!$(C=[O,N,S])]

180 - [nX2,nX3+]

183 - [a;!c]

273 - a

274 - [!#6;!R0]

286 - *=*[*]=,#,:[*]

294 - [#6]~[#7,#8,#16]

299 -
[$([#7X2,OX1,SX1]=*[!H0;!$([a;!n])]),$([#7X3,OX2,SX2;!H0]*=*),$([#7X3,OX2,SX2;!H0]*:n)]

301 - [!$(*#*)&!D1]-!@[!$(*#*)&!D1]

306 - [$([*@](~*)(~*)(*)*),$([*@H](*)(*)*),$([*@](~*)(*)*),$([*@H](~*)~*)]

Looking at the original list of substructures (
http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/fingerprint/SubstructureFingerprinter.html),
you can find that this list is missing at least one pattern (substructure),
namely "Tertiary mixed amine" (32).

Moreover, I tried the following code, to retrieve every bit value:

for(int k = 0; k < fingerprint.cardinality(); k++) {

System.out.println(k + " " + fingerprint.get(k));

}

I was hoping to see the 14 bit values set to "true", but they are set to
"false".

0 false

1 false

2 false

3 false

4 false

5 false

6 false

7 false

8 false

9 false

10 false

11 false

12 false

13 false


Can you help me out here? Am I missing something or has anyone encountered
the same problems? Why is it missing patterns? In my custom array of
patterns used to initialize the custom SubstructureFingerprinter, I added a
pattern for pyrimidine, and it was not found either, although the compound
itself contains a pyrimidine ring. Moreover, why are the 14 bit set to
false?

Thank you for your help.

Best,
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to