Re: [Rdkit-discuss] Missing Properties for Mol in Multiprocessing Pool

2017-01-13 Thread Brian Kelley
By default, normal molecules don't pickle properties.  The pickling is used to 
transfer mols in Python multiprocessing.

Wrapping them in a PropertyMol should solve the issue:

http://www.rdkit.org/Python_Docs/rdkit.Chem.PropertyMol.PropertyMol-class.html


Brian Kelley

> On Jan 13, 2017, at 6:39 PM, Paul Novick  wrote:
> 
> Hello All
> 
> I am getting strange behaviour for mols passed into multiprocessing Pools.  I 
> am finding that all of the SD properties for the mol seem to disappear within 
> the worker process.  In the following, I am attempting to retrieve the 
> 'ChemDiv_IDNUMBER' property from a series of mols.  When doing this is in 
> loop outside of a worker process, the value is retrieved as expected.  
> However, within the worker, the property does not exist.  
> 
> compFile = Chem.SDMolSupplier('mols.sdf')
> iterator = []
> 
> for i in range(5):
> iterator.append(compFile[i])
> print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
> print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> def lookupForI(mol):
> thisresult = [0,0,0,0,0,0]
> print(mol.GetNumHeavyAtoms(), 'atoms in worker')
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> return mol.GetNumHeavyAtoms()
> 
> pool = Pool(3)
> result=pool.map(lookupForI, iterator)
> pool.close()
> pool.join()
> for ares in result:
> print(ares)
> 
> 
> gives the following
> 20 atoms in loop
> 000L-0408 is ID in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 20 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 26 atoms in worker
> 18 atoms in worker
> ---
> RemoteTraceback   Traceback (most recent call last)
> RemoteTraceback: 
> """
> Traceback (most recent call last):
>   File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", 
> line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", 
> line 44, in mapstar
> return list(map(*args))
>   File "", line 16, in lookupForI
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> KeyError: 'ChemDiv_IDNUMBER'
> """
> 
> The above exception was the direct cause of the following exception:
> 
> KeyError  Traceback (most recent call last)
>  in ()
>  35 
>  36 pool = Pool(3)
> ---> 37 result=pool.map(lookupForI, iterator)
>  38 pool.close()
>  39 pool.join()
> 
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in 
> map(self, func, iterable, chunksize)
> 258 in a list that is returned.
> 259 '''
> --> 260 return self._map_async(func, iterable, mapstar, 
> chunksize).get()
> 261 
> 262 def starmap(self, func, iterable, chunksize=None):
> 
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in 
> get(self, timeout)
> 606 return self._value
> 607 else:
> --> 608 raise self._value
> 609 
> 610 def _set(self, i, obj):
> 
> KeyError: 'ChemDiv_IDNUMBER'
> 
> 
> 
> And, when looking to see if any properties are associated with the mol using 
> GetPropNames, I find no properties in the worker process, but all of the 
> properties exist within the loop.
> 
>  iterator = []
> 
> for i in range(5):
> iterator.append(compFile[i])
> print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
> print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
> print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> def lookupForI(mol):
> thisresult = [0,0,0,0,0,0]
> print(len([x for x in mol.GetPropNames()]), 'properties in worker')
> print(mol.GetNumHeavyAtoms(), 'atoms in worker')
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> return mol.GetNumHeavyAtoms()
> ...
> 
> gives
> 
> 
> 76 properties in loop
> 20 atoms in loop
> 000L-0408 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 76 properties in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 0 properties in worker
> 0 properties in worker
> 20 atoms in worker
> 0 properties in worker
> 18 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 0 properties in worker
> 26 atoms in worker
> 0 properties in worker
> --- 
> RemoteTraceback Traceback (most recent call last)
> ...
> Any ideas on where the missing data went, or how to overcome this issue?  
> Thanks in advance for your thoughts!
> 
> Best
> 

[Rdkit-discuss] Missing Properties for Mol in Multiprocessing Pool

2017-01-13 Thread Paul Novick
Hello All

I am getting strange behaviour for mols passed into multiprocessing Pools.
I am finding that all of the SD properties for the mol seem to disappear
within the worker process.  In the following, I am attempting to retrieve
the 'ChemDiv_IDNUMBER' property from a series of mols.  When doing this is
in loop outside of a worker process, the value is retrieved as expected.
However, within the worker, the property does not exist.

compFile = Chem.SDMolSupplier('mols.sdf')
iterator = []

for i in range(5):
iterator.append(compFile[i])
print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

def lookupForI(mol):
thisresult = [0,0,0,0,0,0]
print(mol.GetNumHeavyAtoms(), 'atoms in worker')
print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

return mol.GetNumHeavyAtoms()

pool = Pool(3)
result=pool.map(lookupForI, iterator)
pool.close()
pool.join()
for ares in result:
print(ares)


gives the following

20 atoms in loop
000L-0408 is ID in loop
18 atoms in loop
000L-1176 is ID in loop
18 atoms in loop
000L-1268 is ID in loop
26 atoms in loop
000L-2413 is ID in loop
18 atoms in loop
000L-5632 is ID in loop
20 atoms in worker
18 atoms in worker
18 atoms in worker
26 atoms in worker
18 atoms in worker

---RemoteTraceback
  Traceback (most recent call
last)RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
line 119, in worker
result = (True, func(*args, **kwds))
  File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
line 44, in mapstar
return list(map(*args))
  File "", line 16, in lookupForI
print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
KeyError: 'ChemDiv_IDNUMBER'
"""

The above exception was the direct cause of the following exception:
KeyError  Traceback (most recent call
last) in () 35  36 pool
= Pool(3)---> 37 result=pool.map(lookupForI, iterator) 38
pool.close() 39 pool.join()
/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
map(self, func, iterable, chunksize)258 in a list that is
returned.259 '''--> 260 return
self._map_async(func, iterable, mapstar, chunksize).get()261
262 def starmap(self, func, iterable, chunksize=None):
/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
get(self, timeout)606 return self._value607
 else:--> 608 raise self._value609 610 def
_set(self, i, obj):
KeyError: 'ChemDiv_IDNUMBER'



And, when looking to see if any properties are associated with the mol
using GetPropNames, I find no properties in the worker process, but
all of the properties exist within the loop.

 iterator = []

for i in range(5):
iterator.append(compFile[i])
print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

def lookupForI(mol):
thisresult = [0,0,0,0,0,0]
print(len([x for x in mol.GetPropNames()]), 'properties in worker')
print(mol.GetNumHeavyAtoms(), 'atoms in worker')
print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

return mol.GetNumHeavyAtoms()
...

gives


76 properties in loop
20 atoms in loop
000L-0408 is ID in loop
76 properties in loop
18 atoms in loop
000L-1176 is ID in loop
76 properties in loop
18 atoms in loop
000L-1268 is ID in loop
76 properties in loop
26 atoms in loop
000L-2413 is ID in loop
76 properties in loop
18 atoms in loop
000L-5632 is ID in loop
0 properties in worker
0 properties in worker
20 atoms in worker
0 properties in worker
18 atoms in worker
18 atoms in worker
18 atoms in worker
0 properties in worker
26 atoms in worker
0 properties in worker

---
RemoteTraceback Traceback (most recent call last)
...

Any ideas on where the missing data went, or how to overcome this
issue?  Thanks in advance for your thoughts!

Best
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss