Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask
Hi Pat, What I found useful in the past is to make the imports inside of the functions for dask. Not very elegant, but works. Best, Maciek pon., 22 mar 2021, 14:30 użytkownik Patrick Walters napisał: > 2020.09.5 > > On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN < > guillaume.go...@firmenich.com> wrote: > >> Hi Pat, >> >> >> >> Hum, I’ve got same error as you. >> >> >> >> By the way I have to change code to use this >> >> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt >> >> to avoid another error. >> >> Which version of rdkit do you use ? >> >> >> >> BR >> >> >> >> Guillaume >> >> >> >> >> >> *De : *Patrick Walters >> *Date : *lundi, 22 mars 2021 à 14:20 >> *À : *Guillaume GODIN >> *Cc : *rdkit-discuss >> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask >> >> >> >> The input is just SMILES and molecule name separated by a space. I've >> attached an example. >> >> >> >> Thanks, >> >> >> >> Pat >> >> >> >> >> >> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN < >> guillaume.go...@firmenich.com> wrote: >> >> Hi Pat, >> >> >> >> Do you have a small example file to proceed , or can I use esol.csv for >> example ? >> >> >> >> Thanks >> >> >> >> Guillaume >> >> >> >> *De : *Patrick Walters >> *Date : *lundi, 22 mars 2021 à 13:51 >> *À : *rdkit-discuss >> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask >> >> Apologies, there was a bug in the code I sent in my previous message. >> The problem is the same. Here is the corrected code in a gist. >> >> >> >> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd >> >> >> >> >> >> >> >> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters >> wrote: >> >> Hi All, >> >> >> >> I've been trying to calculate BCUT2D descriptors in parallel with Dask >> and get this error with the code below. >> >> TypeError: cannot pickle 'Boost.Python.function' object >> >> >> >> Everything works if I call mw_df, which calculates molecular weight, but >> I get the error above if I call bcut_df. Does anyone have a workaround? >> >> >> >> Thanks, >> >> >> >> Pat >> >> >> >> #!/usr/bin/env python >> >> import sys >> import dask.dataframe as dd >> import pandas as pd >> from rdkit import Chem >> from rdkit.Chem.Descriptors import MolWt >> from rdkit.Chem.rdMolDescriptors import BCUT2D >> import time >> >> # -- molecular weight functions >> def calc_mw(smi): >> mol = Chem.MolFromSmiles(smi) >> return MolWt(mol) >> >> def mw_df(df): >> return df.SMILES.apply(calc_mw) >> >> # -- bcut functions >> def bcut_df(df): >> return df.apply(calc_bcut) >> >> def calc_bcut(smi): >> mol = Chem.MolFromSmiles(smi) >> return BCUT2D(mol) >> >> def main(): >> start = time.time() >> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) >> ddf = dd.from_pandas(df,npartitions=16) >> ddf['MW'] = >> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') >> ddf['BCUT'] = >> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') >> print(time.time()-start) >> print(ddf.head()) >> >> >> if __name__ == "__main__": >> main() >> >> >> *** >> DISCLAIMER >> This email and any files transmitted with it, including replies and >> forwarded copies (which may contain alterations) subsequently transmitted >> from Firmenich, are confidential and solely for the use of the intended >> recipient. The contents do not represent the opinion of Firmenich except to >> the extent that it relates to their official business. >> >> *** >> >> >> *** >> DISCLAIMER >> This email and any files transmitted with it, including replies and >> forwarded copies (which may contain alterations) subsequently transmitted >> from Firmenich, are confidential and solely for the use of the intended >> recipient. The contents do not represent the opinion of Firmenich except to >> the extent that it relates to their official business. >> >> *** >> > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask
Do you still get the error if you move the import into the function body? def calc_bcut(smi): from rdkit.Chem.rdMolDescriptors import BCUT2D mol = Chem.MolFromSmiles(smi) return BCUT2D(mol) On Mon, Mar 22, 2021 at 7:29 AM Patrick Walters wrote: > 2020.09.5 > > On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN < > guillaume.go...@firmenich.com> wrote: > >> Hi Pat, >> >> >> >> Hum, I’ve got same error as you. >> >> >> >> By the way I have to change code to use this >> >> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt >> >> to avoid another error. >> >> Which version of rdkit do you use ? >> >> >> >> BR >> >> >> >> Guillaume >> >> >> >> >> >> *De : *Patrick Walters >> *Date : *lundi, 22 mars 2021 à 14:20 >> *À : *Guillaume GODIN >> *Cc : *rdkit-discuss >> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask >> >> >> >> The input is just SMILES and molecule name separated by a space. I've >> attached an example. >> >> >> >> Thanks, >> >> >> >> Pat >> >> >> >> >> >> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN < >> guillaume.go...@firmenich.com> wrote: >> >> Hi Pat, >> >> >> >> Do you have a small example file to proceed , or can I use esol.csv for >> example ? >> >> >> >> Thanks >> >> >> >> Guillaume >> >> >> >> *De : *Patrick Walters >> *Date : *lundi, 22 mars 2021 à 13:51 >> *À : *rdkit-discuss >> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask >> >> Apologies, there was a bug in the code I sent in my previous message. >> The problem is the same. Here is the corrected code in a gist. >> >> >> >> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd >> >> >> >> >> >> >> >> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters >> wrote: >> >> Hi All, >> >> >> >> I've been trying to calculate BCUT2D descriptors in parallel with Dask >> and get this error with the code below. >> >> TypeError: cannot pickle 'Boost.Python.function' object >> >> >> >> Everything works if I call mw_df, which calculates molecular weight, but >> I get the error above if I call bcut_df. Does anyone have a workaround? >> >> >> >> Thanks, >> >> >> >> Pat >> >> >> >> #!/usr/bin/env python >> >> import sys >> import dask.dataframe as dd >> import pandas as pd >> from rdkit import Chem >> from rdkit.Chem.Descriptors import MolWt >> from rdkit.Chem.rdMolDescriptors import BCUT2D >> import time >> >> # -- molecular weight functions >> def calc_mw(smi): >> mol = Chem.MolFromSmiles(smi) >> return MolWt(mol) >> >> def mw_df(df): >> return df.SMILES.apply(calc_mw) >> >> # -- bcut functions >> def bcut_df(df): >> return df.apply(calc_bcut) >> >> def calc_bcut(smi): >> mol = Chem.MolFromSmiles(smi) >> return BCUT2D(mol) >> >> def main(): >> start = time.time() >> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) >> ddf = dd.from_pandas(df,npartitions=16) >> ddf['MW'] = >> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') >> ddf['BCUT'] = >> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') >> print(time.time()-start) >> print(ddf.head()) >> >> >> if __name__ == "__main__": >> main() >> >> >> *** >> DISCLAIMER >> This email and any files transmitted with it, including replies and >> forwarded copies (which may contain alterations) subsequently transmitted >> from Firmenich, are confidential and solely for the use of the intended >> recipient. The contents do not represent the opinion of Firmenich except to >> the extent that it relates to their official business. >> >> *** >> >> >> *** >> DISCLAIMER >> This email and any files transmitted with it, including replies and >> forwarded copies (which may contain alterations) subsequently transmitted >> from Firmenich, are confidential and solely for the use of the intended >> recipient. The contents do not represent the opinion of Firmenich except to >> the extent that it relates to their official business. >> >> *** >> > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask
2020.09.5 On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Hi Pat, > > > > Hum, I’ve got same error as you. > > > > By the way I have to change code to use this > > from rdkit.Chem.rdMolDescriptors import CalcExactMolWt > > to avoid another error. > > Which version of rdkit do you use ? > > > > BR > > > > Guillaume > > > > > > *De : *Patrick Walters > *Date : *lundi, 22 mars 2021 à 14:20 > *À : *Guillaume GODIN > *Cc : *rdkit-discuss > *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask > > > > The input is just SMILES and molecule name separated by a space. I've > attached an example. > > > > Thanks, > > > > Pat > > > > > > On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN < > guillaume.go...@firmenich.com> wrote: > > Hi Pat, > > > > Do you have a small example file to proceed , or can I use esol.csv for > example ? > > > > Thanks > > > > Guillaume > > > > *De : *Patrick Walters > *Date : *lundi, 22 mars 2021 à 13:51 > *À : *rdkit-discuss > *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask > > Apologies, there was a bug in the code I sent in my previous message. The > problem is the same. Here is the corrected code in a gist. > > > > https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd > > > > > > > > On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters > wrote: > > Hi All, > > > > I've been trying to calculate BCUT2D descriptors in parallel with Dask and > get this error with the code below. > > TypeError: cannot pickle 'Boost.Python.function' object > > > > Everything works if I call mw_df, which calculates molecular weight, but I > get the error above if I call bcut_df. Does anyone have a workaround? > > > > Thanks, > > > > Pat > > > > #!/usr/bin/env python > > import sys > import dask.dataframe as dd > import pandas as pd > from rdkit import Chem > from rdkit.Chem.Descriptors import MolWt > from rdkit.Chem.rdMolDescriptors import BCUT2D > import time > > # -- molecular weight functions > def calc_mw(smi): > mol = Chem.MolFromSmiles(smi) > return MolWt(mol) > > def mw_df(df): > return df.SMILES.apply(calc_mw) > > # -- bcut functions > def bcut_df(df): > return df.apply(calc_bcut) > > def calc_bcut(smi): > mol = Chem.MolFromSmiles(smi) > return BCUT2D(mol) > > def main(): > start = time.time() > df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) > ddf = dd.from_pandas(df,npartitions=16) > ddf['MW'] = > ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') > ddf['BCUT'] = > ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') > print(time.time()-start) > print(ddf.head()) > > > if __name__ == "__main__": > main() > > > *** > DISCLAIMER > This email and any files transmitted with it, including replies and > forwarded copies (which may contain alterations) subsequently transmitted > from Firmenich, are confidential and solely for the use of the intended > recipient. The contents do not represent the opinion of Firmenich except to > the extent that it relates to their official business. > > *** > > > *** > DISCLAIMER > This email and any files transmitted with it, including replies and > forwarded copies (which may contain alterations) subsequently transmitted > from Firmenich, are confidential and solely for the use of the intended > recipient. The contents do not represent the opinion of Firmenich except to > the extent that it relates to their official business. > > *** > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask
Hi Pat, Hum, I’ve got same error as you. By the way I have to change code to use this from rdkit.Chem.rdMolDescriptors import CalcExactMolWt to avoid another error. Which version of rdkit do you use ? BR Guillaume De : Patrick Walters Date : lundi, 22 mars 2021 à 14:20 À : Guillaume GODIN Cc : rdkit-discuss Objet : Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask The input is just SMILES and molecule name separated by a space. I've attached an example. Thanks, Pat On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN mailto:guillaume.go...@firmenich.com>> wrote: Hi Pat, Do you have a small example file to proceed , or can I use esol.csv for example ? Thanks Guillaume De : Patrick Walters mailto:wpwalt...@gmail.com>> Date : lundi, 22 mars 2021 à 13:51 À : rdkit-discuss mailto:rdkit-discuss@lists.sourceforge.net>> Objet : [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask Apologies, there was a bug in the code I sent in my previous message. The problem is the same. Here is the corrected code in a gist. https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters mailto:wpwalt...@gmail.com>> wrote: Hi All, I've been trying to calculate BCUT2D descriptors in parallel with Dask and get this error with the code below. TypeError: cannot pickle 'Boost.Python.function' object Everything works if I call mw_df, which calculates molecular weight, but I get the error above if I call bcut_df. Does anyone have a workaround? Thanks, Pat #!/usr/bin/env python import sys import dask.dataframe as dd import pandas as pd from rdkit import Chem from rdkit.Chem.Descriptors import MolWt from rdkit.Chem.rdMolDescriptors import BCUT2D import time # -- molecular weight functions def calc_mw(smi): mol = Chem.MolFromSmiles(smi) return MolWt(mol) def mw_df(df): return df.SMILES.apply(calc_mw) # -- bcut functions def bcut_df(df): return df.apply(calc_bcut) def calc_bcut(smi): mol = Chem.MolFromSmiles(smi) return BCUT2D(mol) def main(): start = time.time() df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) ddf = dd.from_pandas(df,npartitions=16) ddf['MW'] = ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') ddf['BCUT'] = ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') print(time.time()-start) print(ddf.head()) if __name__ == "__main__": main() *** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. *** *** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. *** ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask
Hi Pat, Do you have a small example file to proceed , or can I use esol.csv for example ? Thanks Guillaume De : Patrick Walters Date : lundi, 22 mars 2021 à 13:51 À : rdkit-discuss Objet : [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask Apologies, there was a bug in the code I sent in my previous message. The problem is the same. Here is the corrected code in a gist. https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters mailto:wpwalt...@gmail.com>> wrote: Hi All, I've been trying to calculate BCUT2D descriptors in parallel with Dask and get this error with the code below. TypeError: cannot pickle 'Boost.Python.function' object Everything works if I call mw_df, which calculates molecular weight, but I get the error above if I call bcut_df. Does anyone have a workaround? Thanks, Pat #!/usr/bin/env python import sys import dask.dataframe as dd import pandas as pd from rdkit import Chem from rdkit.Chem.Descriptors import MolWt from rdkit.Chem.rdMolDescriptors import BCUT2D import time # -- molecular weight functions def calc_mw(smi): mol = Chem.MolFromSmiles(smi) return MolWt(mol) def mw_df(df): return df.SMILES.apply(calc_mw) # -- bcut functions def bcut_df(df): return df.apply(calc_bcut) def calc_bcut(smi): mol = Chem.MolFromSmiles(smi) return BCUT2D(mol) def main(): start = time.time() df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) ddf = dd.from_pandas(df,npartitions=16) ddf['MW'] = ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') ddf['BCUT'] = ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') print(time.time()-start) print(ddf.head()) if __name__ == "__main__": main() *** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. *** ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask
The input is just SMILES and molecule name separated by a space. I've attached an example. Thanks, Pat On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Hi Pat, > > > > Do you have a small example file to proceed , or can I use esol.csv for > example ? > > > > Thanks > > > > Guillaume > > > > *De : *Patrick Walters > *Date : *lundi, 22 mars 2021 à 13:51 > *À : *rdkit-discuss > *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask > > Apologies, there was a bug in the code I sent in my previous message. The > problem is the same. Here is the corrected code in a gist. > > > > https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd > > > > > > > > On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters > wrote: > > Hi All, > > > > I've been trying to calculate BCUT2D descriptors in parallel with Dask and > get this error with the code below. > > TypeError: cannot pickle 'Boost.Python.function' object > > > > Everything works if I call mw_df, which calculates molecular weight, but I > get the error above if I call bcut_df. Does anyone have a workaround? > > > > Thanks, > > > > Pat > > > > #!/usr/bin/env python > > import sys > import dask.dataframe as dd > import pandas as pd > from rdkit import Chem > from rdkit.Chem.Descriptors import MolWt > from rdkit.Chem.rdMolDescriptors import BCUT2D > import time > > # -- molecular weight functions > def calc_mw(smi): > mol = Chem.MolFromSmiles(smi) > return MolWt(mol) > > def mw_df(df): > return df.SMILES.apply(calc_mw) > > # -- bcut functions > def bcut_df(df): > return df.apply(calc_bcut) > > def calc_bcut(smi): > mol = Chem.MolFromSmiles(smi) > return BCUT2D(mol) > > def main(): > start = time.time() > df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) > ddf = dd.from_pandas(df,npartitions=16) > ddf['MW'] = > ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') > ddf['BCUT'] = > ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') > print(time.time()-start) > print(ddf.head()) > > > if __name__ == "__main__": > main() > > > *** > DISCLAIMER > This email and any files transmitted with it, including replies and > forwarded copies (which may contain alterations) subsequently transmitted > from Firmenich, are confidential and solely for the use of the intended > recipient. The contents do not represent the opinion of Firmenich except to > the extent that it relates to their official business. > > *** > zinc_100.smi Description: Binary data ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss