In answer to my own question, I found a post back in 2013. Adding
trianglePruneBins=False to SigFactory solved the problem. However, I'm still
unsure whether the definition file I used is fit for purpose.
Thanks
Anthony
From: Anthony Nash
Sent: 14 June 2021 14:31
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Error whilst setting up 2D Pharmacophore distance bins
Dear all,
I'm setting up a small library of 2D pharmacophore fingerprints. Although I
understand the theory behind 2D pharmacophores, this is the first time I've
worked with them and therefore I would appreciate your wisdom/guidance.
I'm using the example code from the rdkit HTML documentation:
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
fdefNameStr: str = "MinimalFeatures.fdef"
featFactory = ChemicalFeatures.BuildFeatureFactory(fdefNameStr)
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
sigFactory = SigFactory(featFactory, minPointCount=2, maxPointCount=3)
sigFactory.SetBins([(0,2),(2,5),(5,8)])
sigFactory.Init()
sigFactory.GetSignature()
Note: I've taken the MinimalFeatures.fdef from the Github location
rdkit/Docs/Book/data/MinimalFeatures.fdef - I'm not sure if this was the right
thing to do. I don't have my own set of pharmacophore definitions.
Using the following code I'm able to generate 2D pharmacophores for some of my
compounds:
drug.setPharm2DFP(Generate.Gen2DFingerprint(drug.getRDKitMol(), sigFactory)
However, some compounds cause an exception (please see below the body of this
email). I figure it's happening due to my lack of understanding of
pharmacophores and possibly the use of "MinimalFeatures.fdef".
This is the first compound to throw an exception:
abacavir
C1CC1NC2=C3C(=NC(=N2)N)N(C=N3)C4CC(C=C4)CO
Any thoughts/ideas are appreciated.
Thanks
Anthony
=EXCEPTION===
ValueErrorTraceback (most recent call last)
~\anaconda3\lib\site-packages\rdkit\Chem\Pharm2D\SigFactory.py in
GetBitIdx(self, featIndices, dists, sortIndices)
248 print('\tbins:', repr(self._bins), type(self._bins))
--> 249 bin_ = self._findBinIdx(dists, self._bins,
self._scaffolds[len(dists)])
250 except ValueError:
~\anaconda3\lib\site-packages\rdkit\Chem\Pharm2D\SigFactory.py in
_findBinIdx(self, dists, bins, scaffolds)
167 whichBins[i] = where
--> 168 res = scaffolds.index(tuple(whichBins))
169 if _verbose:
ValueError: (2, 0, 0) is not in list
During handling of the above exception, another exception occurred:
IndexErrorTraceback (most recent call last)
in
29
drug.setTransitionMetalState(containsTransitionMetal(drug.getCanonicalSmiles()))
30 drug.setDrugClass(drugClassStr)
---> 31 drug.setPharm2DFP(Generate.Gen2DFingerprint(drug.getRDKitMol(),
sigFactory))
32 drugDictionary.addCaseDrug(drug, drugNameStr)
33
~\anaconda3\lib\site-packages\rdkit\Chem\Pharm2D\Generate.py in
Gen2DFingerprint(mol, sigFactory, perms, dMat, bitInfo)
160 for match in matchesToMap:
161 if sigFactory.shortestPathsOnly:
--> 162 idx = _ShortestPathsMatch(match, perm, sig, dMat, sigFactory)
163 if idx is not None and bitInfo is not None:
164 l = bitInfo.get(idx, [])
~\anaconda3\lib\site-packages\rdkit\Chem\Pharm2D\Generate.py in
_ShortestPathsMatch(match, featureSet, sig, dMat, sigFactory)
71 dist[i] = d
72
---> 73 idx = sigFactory.GetBitIdx(featureSet, dist, sortIndices=False)
74 if _verbose:
75 print('\t', dist, minD, maxD, idx)
~\anaconda3\lib\site-packages\rdkit\Chem\Pharm2D\SigFactory.py in
GetBitIdx(self, featIndices, dists, sortIndices)
251 fams = self.GetFeatFamilies()
252 fams = [fams[x] for x in featIndices]
--> 253 raise IndexError('distance bin not found: feats: %s;
dists=%s; bins=%s; scaffolds: %s' %
254 (fams, dists, self._bins, self._scaffolds))
255
IndexError: distance bin not found: feats: ['Acceptor', 'Acceptor',
'Aromatic']; dists=[5, 1, 1]; bins=[(0, 2), (2, 5), (5, 8)]; scaffolds: [0,
[(0,), (1,), (2,)], 0, [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2),
(0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1),
(1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 1), (2, 0, 2), (2, 1, 0),
(2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2)], 0]
Kind regards
Dr Anthony Nash PhD MRSC
Senior Research Scientist
Nuffield Department of Clinical Neurosciences
RMCR Kellogg College
University of Oxford
http://www.kellogg.ox.ac.uk/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss