[including rdkit-discuss, because it's relevant there and I'm pretty sure
Chris won't mind and the real Pandas experts may have a better answer than
me.]

On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com> wrote:

>
> I quite like storing molecules and associated data in a data frame and
> I’ve see that it is possible to use rdkit for substructure searching, it is
> possible to also do similarity searching?
>

It's not built in since there are many possible fingerprints that could be
used.

It's not quite as convenient as the substructure search, but here's a
little demo of what you can do to filter based on similarity:

# Start by adding a fingerprint column:
In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2)
for x in df['ROMol']]

# and now filter:
In [21]: ndf =df[df.apply(lambda x:
DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]

In [23]: len(df)
Out[23]: 1000
In [24]: len(ndf)
Out[24]: 2

-greg
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to