[Rdkit-discuss] Incorrect results for substructure search obtained with Tversky similarity.

Axel Rudling Mon, 12 Dec 2016 08:31:15 -0800

Hello all,

Currently I'm doing a project with Tversky searching in substructure mode
and use smiles for creating fingerprints.


For most molecules I get the correct result but there are some molecules
where I get an overflow of falsely predicted substructure molecules. In
brief, I get a large amount of compounds as a result from the substructure
search that are not actually substructures of the query compound. I'm not
certain of why but it might have to do with the FP representation as these
molecules have a very unusual curricular structure ex.:

C1C[NH2+]CCC[NH2+]CCCNCCC[NH2+]C1


I use 2048-bit ECFP4 fingerprints.

tverskySim = DataStructs.TverskySimilarity(ffp1,ffp2,1.0,0.0)

Does anyone have an idea?


best

Axel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Incorrect results for substructure search obtained with Tversky similarity.

Reply via email to