Is it possible to use the bulk similarity searching functionality for better performance instead of the list comprehension?
Best, Peter On Wed, Nov 23, 2016 at 9:11 AM Greg Landrum <greg.land...@gmail.com> wrote: No worries. This, and Anna's question about similarity searching and clustering illustrate a great opportunity for a tutorial on fingerprints and similarity searching. -greg On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain" <sw...@mac.com> wrote: Thanks for this, As a chemist who comes from the “cut and paste” school of scripting I’m always concerned I’m asking something blindingly obvious ;-) Chris On 23 Nov 2016, at 12:36, Greg Landrum <greg.land...@gmail.com> wrote: [including rdkit-discuss, because it's relevant there and I'm pretty sure Chris won't mind and the real Pandas experts may have a better answer than me.] On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com> wrote: I quite like storing molecules and associated data in a data frame and I’ve see that it is possible to use rdkit for substructure searching, it is possible to also do similarity searching? It's not built in since there are many possible fingerprints that could be used. It's not quite as convenient as the substructure search, but here's a little demo of what you can do to filter based on similarity: # Start by adding a fingerprint column: In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2) for x in df['ROMol']] # and now filter: In [21]: ndf =df[df.apply(lambda x: DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)] In [23]: len(df) Out[23]: 1000 In [24]: len(ndf) Out[24]: 2 -greg ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss