By default, normal molecules don't pickle properties.  The pickling is used to 
transfer mols in Python multiprocessing.

Wrapping them in a PropertyMol should solve the issue:

http://www.rdkit.org/Python_Docs/rdkit.Chem.PropertyMol.PropertyMol-class.html

----
Brian Kelley

> On Jan 13, 2017, at 6:39 PM, Paul Novick <paul.nov...@gmail.com> wrote:
> 
> Hello All
> 
> I am getting strange behaviour for mols passed into multiprocessing Pools.  I 
> am finding that all of the SD properties for the mol seem to disappear within 
> the worker process.  In the following, I am attempting to retrieve the 
> 'ChemDiv_IDNUMBER' property from a series of mols.  When doing this is in 
> loop outside of a worker process, the value is retrieved as expected.  
> However, within the worker, the property does not exist.  
> 
> compFile = Chem.SDMolSupplier('mols.sdf')
> iterator = []
> 
> for i in range(5):
>     iterator.append(compFile[i])
>     print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
>     print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> def lookupForI(mol):
>     thisresult = [0,0,0,0,0,0]
>     print(mol.GetNumHeavyAtoms(), 'atoms in worker')
>     print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
>     
>     return mol.GetNumHeavyAtoms()
> 
> pool = Pool(3)
> result=pool.map(lookupForI, iterator)
> pool.close()
> pool.join()
> for ares in result:
>     print(ares)
> 
> 
> gives the following
> 20 atoms in loop
> 000L-0408 is ID in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 20 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 26 atoms in worker
> 18 atoms in worker
> ---------------------------------------------------------------------------
> RemoteTraceback                           Traceback (most recent call last)
> RemoteTraceback: 
> """
> Traceback (most recent call last):
>   File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", 
> line 119, in worker
>     result = (True, func(*args, **kwds))
>   File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", 
> line 44, in mapstar
>     return list(map(*args))
>   File "<ipython-input-98-b305529073c1>", line 16, in lookupForI
>     print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> KeyError: 'ChemDiv_IDNUMBER'
> """
> 
> The above exception was the direct cause of the following exception:
> 
> KeyError                                  Traceback (most recent call last)
> <ipython-input-98-b305529073c1> in <module>()
>      35 
>      36 pool = Pool(3)
> ---> 37 result=pool.map(lookupForI, iterator)
>      38 pool.close()
>      39 pool.join()
> 
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in 
> map(self, func, iterable, chunksize)
>     258         in a list that is returned.
>     259         '''
> --> 260         return self._map_async(func, iterable, mapstar, 
> chunksize).get()
>     261 
>     262     def starmap(self, func, iterable, chunksize=None):
> 
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in 
> get(self, timeout)
>     606             return self._value
>     607         else:
> --> 608             raise self._value
>     609 
>     610     def _set(self, i, obj):
> 
> KeyError: 'ChemDiv_IDNUMBER'
> 
> 
> 
> And, when looking to see if any properties are associated with the mol using 
> GetPropNames, I find no properties in the worker process, but all of the 
> properties exist within the loop.
> 
>  iterator = []
> 
> for i in range(5):
>     iterator.append(compFile[i])
>     print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
>     print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
>     print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> def lookupForI(mol):
>     thisresult = [0,0,0,0,0,0]
>     print(len([x for x in mol.GetPropNames()]), 'properties in worker')
>     print(mol.GetNumHeavyAtoms(), 'atoms in worker')
>     print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
>     
>     return mol.GetNumHeavyAtoms()
> ...
> 
> gives
> 
> 
> 76 properties in loop
> 20 atoms in loop
> 000L-0408 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 76 properties in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 0 properties in worker
> 0 properties in worker
> 20 atoms in worker
> 0 properties in worker
> 18 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 0 properties in worker
> 26 atoms in worker
> 0 properties in worker
> --------------------------------------------------------------------------- 
> RemoteTraceback Traceback (most recent call last)
> ...
> Any ideas on where the missing data went, or how to overcome this issue?  
> Thanks in advance for your thoughts!
> 
> Best
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to