Hello All
I am getting strange behaviour for mols passed into multiprocessing Pools.
I am finding that all of the SD properties for the mol seem to disappear
within the worker process. In the following, I am attempting to retrieve
the 'ChemDiv_IDNUMBER' property from a series of mols. When doing this is
in loop outside of a worker process, the value is retrieved as expected.
However, within the worker, the property does not exist.
compFile = Chem.SDMolSupplier('mols.sdf')
iterator = []
for i in range(5):
iterator.append(compFile[i])
print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
def lookupForI(mol):
thisresult = [0,0,0,0,0,0]
print(mol.GetNumHeavyAtoms(), 'atoms in worker')
print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
return mol.GetNumHeavyAtoms()
pool = Pool(3)
result=pool.map(lookupForI, iterator)
pool.close()
pool.join()
for ares in result:
print(ares)
gives the following
20 atoms in loop
000L-0408 is ID in loop
18 atoms in loop
000L-1176 is ID in loop
18 atoms in loop
000L-1268 is ID in loop
26 atoms in loop
000L-2413 is ID in loop
18 atoms in loop
000L-5632 is ID in loop
20 atoms in worker
18 atoms in worker
18 atoms in worker
26 atoms in worker
18 atoms in worker
---------------------------------------------------------------------------RemoteTraceback
Traceback (most recent call
last)RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
line 119, in worker
result = (True, func(*args, **kwds))
File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
line 44, in mapstar
return list(map(*args))
File "<ipython-input-98-b305529073c1>", line 16, in lookupForI
print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
KeyError: 'ChemDiv_IDNUMBER'
"""
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call
last)<ipython-input-98-b305529073c1> in <module>() 35 36 pool
= Pool(3)---> 37 result=pool.map(lookupForI, iterator) 38
pool.close() 39 pool.join()
/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
map(self, func, iterable, chunksize) 258 in a list that is
returned. 259 '''--> 260 return
self._map_async(func, iterable, mapstar, chunksize).get() 261
262 def starmap(self, func, iterable, chunksize=None):
/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
get(self, timeout) 606 return self._value 607
else:--> 608 raise self._value 609 610 def
_set(self, i, obj):
KeyError: 'ChemDiv_IDNUMBER'
And, when looking to see if any properties are associated with the mol
using GetPropNames, I find no properties in the worker process, but
all of the properties exist within the loop.
iterator = []
for i in range(5):
iterator.append(compFile[i])
print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
def lookupForI(mol):
thisresult = [0,0,0,0,0,0]
print(len([x for x in mol.GetPropNames()]), 'properties in worker')
print(mol.GetNumHeavyAtoms(), 'atoms in worker')
print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
return mol.GetNumHeavyAtoms()
...
gives
76 properties in loop
20 atoms in loop
000L-0408 is ID in loop
76 properties in loop
18 atoms in loop
000L-1176 is ID in loop
76 properties in loop
18 atoms in loop
000L-1268 is ID in loop
76 properties in loop
26 atoms in loop
000L-2413 is ID in loop
76 properties in loop
18 atoms in loop
000L-5632 is ID in loop
0 properties in worker
0 properties in worker
20 atoms in worker
0 properties in worker
18 atoms in worker
18 atoms in worker
18 atoms in worker
0 properties in worker
26 atoms in worker
0 properties in worker
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
...
Any ideas on where the missing data went, or how to overcome this
issue? Thanks in advance for your thoughts!
Best
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss