Thanks, Greg.  Yutong Zhao sent me the same solution and I was just about
to post his fix to the list.  It's funny how I posted to the list and a
colleague had the answer.

Thanks all, the RDKit community is awesome!

On Mon, Mar 22, 2021 at 9:55 AM Greg Landrum <greg.land...@gmail.com> wrote:

> Hi Pat,
>
> Solution, either change your calc_bcut function to:
> def calc_bcut(smi):
>     from rdkit.Chem.rdMolDescriptors import BCUT2D
>     mol = Chem.MolFromSmiles(smi)
>     return BCUT2D(mol)
>
> or change the import on line 8 at the top to:
> from rdkit.Chem import rdMolDescriptors
>
> and do:
> def calc_bcut(smi):
>     mol = Chem.MolFromSmiles(smi)
>     return rdMolDescriptors.BCUT2D(mol)
>
> The second approach is probably more efficient.
>
> I'm not 100% sure what's happening, but it looks like dask is trying to
> somehow package up whatever is being used in calc_bcut() and is having a
> problem when it sees the BCUT2D object, which is a Boost.Python.function
> instead of a normal Python function:
>
> In [3]: type(MolWt)
> Out[3]: function
>
> In [4]: type(BCUT2D)
> Out[4]: Boost.Python.function
>
> By either explicitly doing the import in calc_bcut() or referencing the
> function through the module, dask seems to be able to figure out how to do
> the right thing.
>
> -greg
> p.s. in case you see different behavior:
> In [2]: dask.__version__
> Out[2]: '2020.12.0'
>
>
>
>
> On Mon, Mar 22, 2021 at 1:51 PM Patrick Walters <wpwalt...@gmail.com>
> wrote:
>
>> Apologies, there was a bug in the code I sent in my previous message.
>> The problem is the same.  Here is the corrected code in a gist.
>>
>> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>>
>>
>>
>> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters <wpwalt...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I've been trying to calculate BCUT2D descriptors in parallel with Dask
>>> and get this error with the code below.
>>> TypeError: cannot pickle 'Boost.Python.function' object
>>>
>>> Everything works if I call mw_df, which calculates molecular weight, but
>>> I get the error above if I call bcut_df.  Does anyone have a workaround?
>>>
>>> Thanks,
>>>
>>> Pat
>>>
>>> #!/usr/bin/env python
>>>
>>> import sys
>>> import dask.dataframe as dd
>>> import pandas as pd
>>> from rdkit import Chem
>>> from rdkit.Chem.Descriptors import MolWt
>>> from rdkit.Chem.rdMolDescriptors import BCUT2D
>>> import time
>>>
>>> # --  molecular weight functions
>>> def calc_mw(smi):
>>>     mol = Chem.MolFromSmiles(smi)
>>>     return MolWt(mol)
>>>
>>> def mw_df(df):
>>>     return df.SMILES.apply(calc_mw)
>>>
>>> # -- bcut functions
>>> def bcut_df(df):
>>>     return df.apply(calc_bcut)
>>>
>>> def calc_bcut(smi):
>>>     mol = Chem.MolFromSmiles(smi)
>>>     return BCUT2D(mol)
>>>
>>> def main():
>>>     start = time.time()
>>>     df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
>>>     ddf = dd.from_pandas(df,npartitions=16)
>>>     ddf['MW'] =
>>> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
>>>     ddf['BCUT'] =
>>> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
>>>     print(time.time()-start)
>>>     print(ddf.head())
>>>
>>>
>>> if __name__ == "__main__":
>>>     main()
>>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to