[including rdkit-discuss, because it's relevant there and I'm pretty sure Chris won't mind and the real Pandas experts may have a better answer than me.]
On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com> wrote: > > I quite like storing molecules and associated data in a data frame and > I’ve see that it is possible to use rdkit for substructure searching, it is > possible to also do similarity searching? > It's not built in since there are many possible fingerprints that could be used. It's not quite as convenient as the substructure search, but here's a little demo of what you can do to filter based on similarity: # Start by adding a fingerprint column: In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2) for x in df['ROMol']] # and now filter: In [21]: ndf =df[df.apply(lambda x: DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)] In [23]: len(df) Out[23]: 1000 In [24]: len(ndf) Out[24]: 2 -greg
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss