By default, normal molecules don't pickle properties. The pickling is used to
transfer mols in Python multiprocessing.
Wrapping them in a PropertyMol should solve the issue:
http://www.rdkit.org/Python_Docs/rdkit.Chem.PropertyMol.PropertyMol-class.html
----
Brian Kelley
> On Jan 13, 2017, at 6:39 PM, Paul Novick <paul.nov...@gmail.com> wrote:
>
> Hello All
>
> I am getting strange behaviour for mols passed into multiprocessing Pools. I
> am finding that all of the SD properties for the mol seem to disappear within
> the worker process. In the following, I am attempting to retrieve the
> 'ChemDiv_IDNUMBER' property from a series of mols. When doing this is in
> loop outside of a worker process, the value is retrieved as expected.
> However, within the worker, the property does not exist.
>
> compFile = Chem.SDMolSupplier('mols.sdf')
> iterator = []
>
> for i in range(5):
> iterator.append(compFile[i])
> print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
> print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
>
> def lookupForI(mol):
> thisresult = [0,0,0,0,0,0]
> print(mol.GetNumHeavyAtoms(), 'atoms in worker')
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
>
> return mol.GetNumHeavyAtoms()
>
> pool = Pool(3)
> result=pool.map(lookupForI, iterator)
> pool.close()
> pool.join()
> for ares in result:
> print(ares)
>
>
> gives the following
> 20 atoms in loop
> 000L-0408 is ID in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 20 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 26 atoms in worker
> 18 atoms in worker
> ---------------------------------------------------------------------------
> RemoteTraceback Traceback (most recent call last)
> RemoteTraceback:
> """
> Traceback (most recent call last):
> File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
> line 119, in worker
> result = (True, func(*args, **kwds))
> File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
> line 44, in mapstar
> return list(map(*args))
> File "<ipython-input-98-b305529073c1>", line 16, in lookupForI
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> KeyError: 'ChemDiv_IDNUMBER'
> """
>
> The above exception was the direct cause of the following exception:
>
> KeyError Traceback (most recent call last)
> <ipython-input-98-b305529073c1> in <module>()
> 35
> 36 pool = Pool(3)
> ---> 37 result=pool.map(lookupForI, iterator)
> 38 pool.close()
> 39 pool.join()
>
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
> map(self, func, iterable, chunksize)
> 258 in a list that is returned.
> 259 '''
> --> 260 return self._map_async(func, iterable, mapstar,
> chunksize).get()
> 261
> 262 def starmap(self, func, iterable, chunksize=None):
>
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
> get(self, timeout)
> 606 return self._value
> 607 else:
> --> 608 raise self._value
> 609
> 610 def _set(self, i, obj):
>
> KeyError: 'ChemDiv_IDNUMBER'
>
>
>
> And, when looking to see if any properties are associated with the mol using
> GetPropNames, I find no properties in the worker process, but all of the
> properties exist within the loop.
>
> iterator = []
>
> for i in range(5):
> iterator.append(compFile[i])
> print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
> print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
> print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
>
> def lookupForI(mol):
> thisresult = [0,0,0,0,0,0]
> print(len([x for x in mol.GetPropNames()]), 'properties in worker')
> print(mol.GetNumHeavyAtoms(), 'atoms in worker')
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
>
> return mol.GetNumHeavyAtoms()
> ...
>
> gives
>
>
> 76 properties in loop
> 20 atoms in loop
> 000L-0408 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 76 properties in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 0 properties in worker
> 0 properties in worker
> 20 atoms in worker
> 0 properties in worker
> 18 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 0 properties in worker
> 26 atoms in worker
> 0 properties in worker
> ---------------------------------------------------------------------------
> RemoteTraceback Traceback (most recent call last)
> ...
> Any ideas on where the missing data went, or how to overcome this issue?
> Thanks in advance for your thoughts!
>
> Best
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss