Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Maciek Wójcikowski
Hi Pat,

What I found useful in the past is to make the imports inside of the
functions for dask. Not very elegant, but works.

Best,
Maciek

pon., 22 mar 2021, 14:30 użytkownik Patrick Walters 
napisał:

> 2020.09.5
>
> On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>> Hi Pat,
>>
>>
>>
>> Hum, I’ve got same error as you.
>>
>>
>>
>> By the way I have to change code to use this
>>
>> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
>>
>> to avoid another error.
>>
>> Which version of rdkit do you use  ?
>>
>>
>>
>> BR
>>
>>
>>
>> Guillaume
>>
>>
>>
>>
>>
>> *De : *Patrick Walters 
>> *Date : *lundi, 22 mars 2021 à 14:20
>> *À : *Guillaume GODIN 
>> *Cc : *rdkit-discuss 
>> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>>
>>
>>
>> The input is just SMILES and molecule name separated by a space.   I've
>> attached an example.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pat
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
>> guillaume.go...@firmenich.com> wrote:
>>
>> Hi Pat,
>>
>>
>>
>> Do you have a small example file to proceed , or can I use esol.csv for
>> example ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Guillaume
>>
>>
>>
>> *De : *Patrick Walters 
>> *Date : *lundi, 22 mars 2021 à 13:51
>> *À : *rdkit-discuss 
>> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>>
>> Apologies, there was a bug in the code I sent in my previous message.
>> The problem is the same.  Here is the corrected code in a gist.
>>
>>
>>
>> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
>> wrote:
>>
>> Hi All,
>>
>>
>>
>> I've been trying to calculate BCUT2D descriptors in parallel with Dask
>> and get this error with the code below.
>>
>> TypeError: cannot pickle 'Boost.Python.function' object
>>
>>
>>
>> Everything works if I call mw_df, which calculates molecular weight, but
>> I get the error above if I call bcut_df.  Does anyone have a workaround?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pat
>>
>>
>>
>> #!/usr/bin/env python
>>
>> import sys
>> import dask.dataframe as dd
>> import pandas as pd
>> from rdkit import Chem
>> from rdkit.Chem.Descriptors import MolWt
>> from rdkit.Chem.rdMolDescriptors import BCUT2D
>> import time
>>
>> # --  molecular weight functions
>> def calc_mw(smi):
>> mol = Chem.MolFromSmiles(smi)
>> return MolWt(mol)
>>
>> def mw_df(df):
>> return df.SMILES.apply(calc_mw)
>>
>> # -- bcut functions
>> def bcut_df(df):
>> return df.apply(calc_bcut)
>>
>> def calc_bcut(smi):
>> mol = Chem.MolFromSmiles(smi)
>> return BCUT2D(mol)
>>
>> def main():
>> start = time.time()
>> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
>> ddf = dd.from_pandas(df,npartitions=16)
>> ddf['MW'] =
>> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
>> ddf['BCUT'] =
>> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
>> print(time.time()-start)
>> print(ddf.head())
>>
>>
>> if __name__ == "__main__":
>> main()
>>
>>
>> ***
>> DISCLAIMER
>> This email and any files transmitted with it, including replies and
>> forwarded copies (which may contain alterations) subsequently transmitted
>> from Firmenich, are confidential and solely for the use of the intended
>> recipient. The contents do not represent the opinion of Firmenich except to
>> the extent that it relates to their official business.
>>
>> ***
>>
>>
>> ***
>> DISCLAIMER
>> This email and any files transmitted with it, including replies and
>> forwarded copies (which may contain alterations) subsequently transmitted
>> from Firmenich, are confidential and solely for the use of the intended
>> recipient. The contents do not represent the opinion of Firmenich except to
>> the extent that it relates to their official business.
>>
>> ***
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Peter St. John
Do you still get the error if you move the import into the function body?

def calc_bcut(smi):
from rdkit.Chem.rdMolDescriptors import BCUT2D
mol = Chem.MolFromSmiles(smi)
return BCUT2D(mol)


On Mon, Mar 22, 2021 at 7:29 AM Patrick Walters  wrote:

> 2020.09.5
>
> On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>> Hi Pat,
>>
>>
>>
>> Hum, I’ve got same error as you.
>>
>>
>>
>> By the way I have to change code to use this
>>
>> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
>>
>> to avoid another error.
>>
>> Which version of rdkit do you use  ?
>>
>>
>>
>> BR
>>
>>
>>
>> Guillaume
>>
>>
>>
>>
>>
>> *De : *Patrick Walters 
>> *Date : *lundi, 22 mars 2021 à 14:20
>> *À : *Guillaume GODIN 
>> *Cc : *rdkit-discuss 
>> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>>
>>
>>
>> The input is just SMILES and molecule name separated by a space.   I've
>> attached an example.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pat
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
>> guillaume.go...@firmenich.com> wrote:
>>
>> Hi Pat,
>>
>>
>>
>> Do you have a small example file to proceed , or can I use esol.csv for
>> example ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Guillaume
>>
>>
>>
>> *De : *Patrick Walters 
>> *Date : *lundi, 22 mars 2021 à 13:51
>> *À : *rdkit-discuss 
>> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>>
>> Apologies, there was a bug in the code I sent in my previous message.
>> The problem is the same.  Here is the corrected code in a gist.
>>
>>
>>
>> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
>> wrote:
>>
>> Hi All,
>>
>>
>>
>> I've been trying to calculate BCUT2D descriptors in parallel with Dask
>> and get this error with the code below.
>>
>> TypeError: cannot pickle 'Boost.Python.function' object
>>
>>
>>
>> Everything works if I call mw_df, which calculates molecular weight, but
>> I get the error above if I call bcut_df.  Does anyone have a workaround?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pat
>>
>>
>>
>> #!/usr/bin/env python
>>
>> import sys
>> import dask.dataframe as dd
>> import pandas as pd
>> from rdkit import Chem
>> from rdkit.Chem.Descriptors import MolWt
>> from rdkit.Chem.rdMolDescriptors import BCUT2D
>> import time
>>
>> # --  molecular weight functions
>> def calc_mw(smi):
>> mol = Chem.MolFromSmiles(smi)
>> return MolWt(mol)
>>
>> def mw_df(df):
>> return df.SMILES.apply(calc_mw)
>>
>> # -- bcut functions
>> def bcut_df(df):
>> return df.apply(calc_bcut)
>>
>> def calc_bcut(smi):
>> mol = Chem.MolFromSmiles(smi)
>> return BCUT2D(mol)
>>
>> def main():
>> start = time.time()
>> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
>> ddf = dd.from_pandas(df,npartitions=16)
>> ddf['MW'] =
>> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
>> ddf['BCUT'] =
>> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
>> print(time.time()-start)
>> print(ddf.head())
>>
>>
>> if __name__ == "__main__":
>> main()
>>
>>
>> ***
>> DISCLAIMER
>> This email and any files transmitted with it, including replies and
>> forwarded copies (which may contain alterations) subsequently transmitted
>> from Firmenich, are confidential and solely for the use of the intended
>> recipient. The contents do not represent the opinion of Firmenich except to
>> the extent that it relates to their official business.
>>
>> ***
>>
>>
>> ***
>> DISCLAIMER
>> This email and any files transmitted with it, including replies and
>> forwarded copies (which may contain alterations) subsequently transmitted
>> from Firmenich, are confidential and solely for the use of the intended
>> recipient. The contents do not represent the opinion of Firmenich except to
>> the extent that it relates to their official business.
>>
>> ***
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
2020.09.5

On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Hi Pat,
>
>
>
> Hum, I’ve got same error as you.
>
>
>
> By the way I have to change code to use this
>
> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
>
> to avoid another error.
>
> Which version of rdkit do you use  ?
>
>
>
> BR
>
>
>
> Guillaume
>
>
>
>
>
> *De : *Patrick Walters 
> *Date : *lundi, 22 mars 2021 à 14:20
> *À : *Guillaume GODIN 
> *Cc : *rdkit-discuss 
> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
>
>
> The input is just SMILES and molecule name separated by a space.   I've
> attached an example.
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
>
>
> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
> Hi Pat,
>
>
>
> Do you have a small example file to proceed , or can I use esol.csv for
> example ?
>
>
>
> Thanks
>
>
>
> Guillaume
>
>
>
> *De : *Patrick Walters 
> *Date : *lundi, 22 mars 2021 à 13:51
> *À : *rdkit-discuss 
> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
> Apologies, there was a bug in the code I sent in my previous message.  The
> problem is the same.  Here is the corrected code in a gist.
>
>
>
> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>
>
>
>
>
>
>
> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
> wrote:
>
> Hi All,
>
>
>
> I've been trying to calculate BCUT2D descriptors in parallel with Dask and
> get this error with the code below.
>
> TypeError: cannot pickle 'Boost.Python.function' object
>
>
>
> Everything works if I call mw_df, which calculates molecular weight, but I
> get the error above if I call bcut_df.  Does anyone have a workaround?
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
> #!/usr/bin/env python
>
> import sys
> import dask.dataframe as dd
> import pandas as pd
> from rdkit import Chem
> from rdkit.Chem.Descriptors import MolWt
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> import time
>
> # --  molecular weight functions
> def calc_mw(smi):
> mol = Chem.MolFromSmiles(smi)
> return MolWt(mol)
>
> def mw_df(df):
> return df.SMILES.apply(calc_mw)
>
> # -- bcut functions
> def bcut_df(df):
> return df.apply(calc_bcut)
>
> def calc_bcut(smi):
> mol = Chem.MolFromSmiles(smi)
> return BCUT2D(mol)
>
> def main():
> start = time.time()
> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
> ddf = dd.from_pandas(df,npartitions=16)
> ddf['MW'] =
> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
> ddf['BCUT'] =
> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
> print(time.time()-start)
> print(ddf.head())
>
>
> if __name__ == "__main__":
> main()
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Guillaume GODIN via Rdkit-discuss
Hi Pat,

Hum, I’ve got same error as you.

By the way I have to change code to use this
from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
to avoid another error.
Which version of rdkit do you use  ?

BR

Guillaume


De : Patrick Walters 
Date : lundi, 22 mars 2021 à 14:20
À : Guillaume GODIN 
Cc : rdkit-discuss 
Objet : Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask

The input is just SMILES and molecule name separated by a space.   I've 
attached an example.

Thanks,

Pat


On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN 
mailto:guillaume.go...@firmenich.com>> wrote:
Hi Pat,

Do you have a small example file to proceed , or can I use esol.csv for example 
?

Thanks

Guillaume

De : Patrick Walters mailto:wpwalt...@gmail.com>>
Date : lundi, 22 mars 2021 à 13:51
À : rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Objet : [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
Apologies, there was a bug in the code I sent in my previous message.  The 
problem is the same.  Here is the corrected code in a gist.

https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd



On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
mailto:wpwalt...@gmail.com>> wrote:
Hi All,

I've been trying to calculate BCUT2D descriptors in parallel with Dask and get 
this error with the code below.
TypeError: cannot pickle 'Boost.Python.function' object

Everything works if I call mw_df, which calculates molecular weight, but I get 
the error above if I call bcut_df.  Does anyone have a workaround?

Thanks,

Pat

#!/usr/bin/env python

import sys
import dask.dataframe as dd
import pandas as pd
from rdkit import Chem
from rdkit.Chem.Descriptors import MolWt
from rdkit.Chem.rdMolDescriptors import BCUT2D
import time

# --  molecular weight functions
def calc_mw(smi):
mol = Chem.MolFromSmiles(smi)
return MolWt(mol)

def mw_df(df):
return df.SMILES.apply(calc_mw)

# -- bcut functions
def bcut_df(df):
return df.apply(calc_bcut)

def calc_bcut(smi):
mol = Chem.MolFromSmiles(smi)
return BCUT2D(mol)

def main():
start = time.time()
df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
ddf = dd.from_pandas(df,npartitions=16)
ddf['MW'] = 
ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
ddf['BCUT'] = 
ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
print(time.time()-start)
print(ddf.head())


if __name__ == "__main__":
main()
***
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.
***

***
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  
***
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Guillaume GODIN via Rdkit-discuss
Hi Pat,

Do you have a small example file to proceed , or can I use esol.csv for example 
?

Thanks

Guillaume

De : Patrick Walters 
Date : lundi, 22 mars 2021 à 13:51
À : rdkit-discuss 
Objet : [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
Apologies, there was a bug in the code I sent in my previous message.  The 
problem is the same.  Here is the corrected code in a gist.

https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd



On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
mailto:wpwalt...@gmail.com>> wrote:
Hi All,

I've been trying to calculate BCUT2D descriptors in parallel with Dask and get 
this error with the code below.
TypeError: cannot pickle 'Boost.Python.function' object

Everything works if I call mw_df, which calculates molecular weight, but I get 
the error above if I call bcut_df.  Does anyone have a workaround?

Thanks,

Pat

#!/usr/bin/env python

import sys
import dask.dataframe as dd
import pandas as pd
from rdkit import Chem
from rdkit.Chem.Descriptors import MolWt
from rdkit.Chem.rdMolDescriptors import BCUT2D
import time

# --  molecular weight functions
def calc_mw(smi):
mol = Chem.MolFromSmiles(smi)
return MolWt(mol)

def mw_df(df):
return df.SMILES.apply(calc_mw)

# -- bcut functions
def bcut_df(df):
return df.apply(calc_bcut)

def calc_bcut(smi):
mol = Chem.MolFromSmiles(smi)
return BCUT2D(mol)

def main():
start = time.time()
df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
ddf = dd.from_pandas(df,npartitions=16)
ddf['MW'] = 
ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
ddf['BCUT'] = 
ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
print(time.time()-start)
print(ddf.head())


if __name__ == "__main__":
main()

***
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  
***
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
The input is just SMILES and molecule name separated by a space.   I've
attached an example.

Thanks,

Pat


On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Hi Pat,
>
>
>
> Do you have a small example file to proceed , or can I use esol.csv for
> example ?
>
>
>
> Thanks
>
>
>
> Guillaume
>
>
>
> *De : *Patrick Walters 
> *Date : *lundi, 22 mars 2021 à 13:51
> *À : *rdkit-discuss 
> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
> Apologies, there was a bug in the code I sent in my previous message.  The
> problem is the same.  Here is the corrected code in a gist.
>
>
>
> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>
>
>
>
>
>
>
> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
> wrote:
>
> Hi All,
>
>
>
> I've been trying to calculate BCUT2D descriptors in parallel with Dask and
> get this error with the code below.
>
> TypeError: cannot pickle 'Boost.Python.function' object
>
>
>
> Everything works if I call mw_df, which calculates molecular weight, but I
> get the error above if I call bcut_df.  Does anyone have a workaround?
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
> #!/usr/bin/env python
>
> import sys
> import dask.dataframe as dd
> import pandas as pd
> from rdkit import Chem
> from rdkit.Chem.Descriptors import MolWt
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> import time
>
> # --  molecular weight functions
> def calc_mw(smi):
> mol = Chem.MolFromSmiles(smi)
> return MolWt(mol)
>
> def mw_df(df):
> return df.SMILES.apply(calc_mw)
>
> # -- bcut functions
> def bcut_df(df):
> return df.apply(calc_bcut)
>
> def calc_bcut(smi):
> mol = Chem.MolFromSmiles(smi)
> return BCUT2D(mol)
>
> def main():
> start = time.time()
> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
> ddf = dd.from_pandas(df,npartitions=16)
> ddf['MW'] =
> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
> ddf['BCUT'] =
> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
> print(time.time()-start)
> print(ddf.head())
>
>
> if __name__ == "__main__":
> main()
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>


zinc_100.smi
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss