Hi Jan, The GetMorganFingerprint() returns count fingerprints, and the Tanimoto calculation does the full Jaccard similarity, including the counts.
The GetMorganFingerprintAsBitVect() version only uses the keys (that is, it treats all non-zero values as being 1) when computing the Tanimoto. > On Sep 14, 2019, at 11:07, Jan Halborg Jensen <jhjen...@chem.ku.dk> wrote: > > When using GetMorganFingerprintAsBitVect I get the “expected” Tanimoto score > > mol1 = Chem.MolFromSmiles('CCC') > mol2 = Chem.MolFromSmiles('CNC') > > fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1,2,nBits=1024) > fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2,2,nBits=1024) >>> list(fp1.GetOnBits()) [33, 80, 294, 320] >>> list(fp2.GetOnBits()) [33, 128, 406, 539] You can see the intersection is 1 and the union is 7, giving 1/7 = 0.142... as the Tanimoto, which is what you demonstrated was the result. > However, when using GetMorganFingerprint I get a difference score. > > fp1 = AllChem.GetMorganFingerprint(mol1,2) > fp2 = AllChem.GetMorganFingerprint(mol2,2) >>> fp1.GetNonzeroElements() {2068133184: 1, 2245384272: 1, 2246728737: 2, 3542456614: 2} >>> fp2.GetNonzeroElements() {847961216: 1, 869080603: 1, 2246728737: 2, 3824063894: 2} Note that there is one shared key (2246728737) while the other 7 are unique. The binary Tanimoto - treating all counts as 1 - gives 1/7, matching the BitVect version. On the other hand, the common value 2246728737 is present 2 times in each fingerprint, and 3542456614 and 3824063894 are each present twice in one fingerprint, so the Jaccard, or count Tanimoto, is 2 / ((1+1+2+2)+(1+1+2+2)-2) = 2/10 = 0.2 matching the value you computed. Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss