Is it possible to use the bulk similarity searching functionality for
better performance instead of the list comprehension?

Best,

Peter


On Wed, Nov 23, 2016 at 9:11 AM Greg Landrum <greg.land...@gmail.com> wrote:

No worries.
This, and Anna's question about similarity searching and clustering
illustrate a great opportunity for a tutorial on fingerprints and
similarity searching.

-greg





On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain" <sw...@mac.com> wrote:

Thanks for this,

As a chemist who comes from the “cut and paste” school of scripting I’m
always concerned I’m asking something blindingly obvious

;-)

Chris

On 23 Nov 2016, at 12:36, Greg Landrum <greg.land...@gmail.com> wrote:

[including rdkit-discuss, because it's relevant there and I'm pretty sure
Chris won't mind and the real Pandas experts may have a better answer than
me.]

On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com> wrote:


I quite like storing molecules and associated data in a data frame and I’ve
see that it is possible to use rdkit for substructure searching, it is
possible to also do similarity searching?


It's not built in since there are many possible fingerprints that could be
used.

It's not quite as convenient as the substructure search, but here's a
little demo of what you can do to filter based on similarity:

# Start by adding a fingerprint column:
In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2)
for x in df['ROMol']]

# and now filter:
In [21]: ndf =df[df.apply(lambda x:
DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]

In [23]: len(df)
Out[23]: 1000
In [24]: len(ndf)
Out[24]: 2

-greg


------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to