Hi Pat,

Hum, I’ve got same error as you.

By the way I have to change code to use this
from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
to avoid another error.
Which version of rdkit do you use  ?

BR

Guillaume


De : Patrick Walters <wpwalt...@gmail.com>
Date : lundi, 22 mars 2021 à 14:20
À : Guillaume GODIN <guillaume.go...@firmenich.com>
Cc : rdkit-discuss <rdkit-discuss@lists.sourceforge.net>
Objet : Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask

The input is just SMILES and molecule name separated by a space.   I've 
attached an example.

Thanks,

Pat


On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN 
<guillaume.go...@firmenich.com<mailto:guillaume.go...@firmenich.com>> wrote:
Hi Pat,

Do you have a small example file to proceed , or can I use esol.csv for example 
?

Thanks

Guillaume

De : Patrick Walters <wpwalt...@gmail.com<mailto:wpwalt...@gmail.com>>
Date : lundi, 22 mars 2021 à 13:51
À : rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>>
Objet : [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
Apologies, there was a bug in the code I sent in my previous message.  The 
problem is the same.  Here is the corrected code in a gist.

https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd



On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
<wpwalt...@gmail.com<mailto:wpwalt...@gmail.com>> wrote:
Hi All,

I've been trying to calculate BCUT2D descriptors in parallel with Dask and get 
this error with the code below.
TypeError: cannot pickle 'Boost.Python.function' object

Everything works if I call mw_df, which calculates molecular weight, but I get 
the error above if I call bcut_df.  Does anyone have a workaround?

Thanks,

Pat

#!/usr/bin/env python

import sys
import dask.dataframe as dd
import pandas as pd
from rdkit import Chem
from rdkit.Chem.Descriptors import MolWt
from rdkit.Chem.rdMolDescriptors import BCUT2D
import time

# --  molecular weight functions
def calc_mw(smi):
    mol = Chem.MolFromSmiles(smi)
    return MolWt(mol)

def mw_df(df):
    return df.SMILES.apply(calc_mw)

# -- bcut functions
def bcut_df(df):
    return df.apply(calc_bcut)

def calc_bcut(smi):
    mol = Chem.MolFromSmiles(smi)
    return BCUT2D(mol)

def main():
    start = time.time()
    df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
    ddf = dd.from_pandas(df,npartitions=16)
    ddf['MW'] = 
ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
    ddf['BCUT'] = 
ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
    print(time.time()-start)
    print(ddf.head())


if __name__ == "__main__":
    main()
***********************************************************************************
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.
***********************************************************************************

***********************************************************************************
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  
***********************************************************************************
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to