Hello All

I am getting strange behaviour for mols passed into multiprocessing Pools.
I am finding that all of the SD properties for the mol seem to disappear
within the worker process.  In the following, I am attempting to retrieve
the 'ChemDiv_IDNUMBER' property from a series of mols.  When doing this is
in loop outside of a worker process, the value is retrieved as expected.
However, within the worker, the property does not exist.

compFile = Chem.SDMolSupplier('mols.sdf')
iterator = []

for i in range(5):
    iterator.append(compFile[i])
    print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
    print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

def lookupForI(mol):
    thisresult = [0,0,0,0,0,0]
    print(mol.GetNumHeavyAtoms(), 'atoms in worker')
    print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

    return mol.GetNumHeavyAtoms()

pool = Pool(3)
result=pool.map(lookupForI, iterator)
pool.close()
pool.join()
for ares in result:
    print(ares)


gives the following

20 atoms in loop
000L-0408 is ID in loop
18 atoms in loop
000L-1176 is ID in loop
18 atoms in loop
000L-1268 is ID in loop
26 atoms in loop
000L-2413 is ID in loop
18 atoms in loop
000L-5632 is ID in loop
20 atoms in worker
18 atoms in worker
18 atoms in worker
26 atoms in worker
18 atoms in worker

---------------------------------------------------------------------------RemoteTraceback
                          Traceback (most recent call
last)RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py",
line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-98-b305529073c1>", line 16, in lookupForI
    print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
KeyError: 'ChemDiv_IDNUMBER'
"""

The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call
last)<ipython-input-98-b305529073c1> in <module>()     35      36 pool
= Pool(3)---> 37 result=pool.map(lookupForI, iterator)     38
pool.close()     39 pool.join()
/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
map(self, func, iterable, chunksize)    258         in a list that is
returned.    259         '''--> 260         return
self._map_async(func, iterable, mapstar, chunksize).get()    261
262     def starmap(self, func, iterable, chunksize=None):
/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in
get(self, timeout)    606             return self._value    607
 else:--> 608             raise self._value    609     610     def
_set(self, i, obj):
KeyError: 'ChemDiv_IDNUMBER'



And, when looking to see if any properties are associated with the mol
using GetPropNames, I find no properties in the worker process, but
all of the properties exist within the loop.

 iterator = []

for i in range(5):
    iterator.append(compFile[i])
    print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
    print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
    print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

def lookupForI(mol):
    thisresult = [0,0,0,0,0,0]
    print(len([x for x in mol.GetPropNames()]), 'properties in worker')
    print(mol.GetNumHeavyAtoms(), 'atoms in worker')
    print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')

    return mol.GetNumHeavyAtoms()
...

gives


76 properties in loop
20 atoms in loop
000L-0408 is ID in loop
76 properties in loop
18 atoms in loop
000L-1176 is ID in loop
76 properties in loop
18 atoms in loop
000L-1268 is ID in loop
76 properties in loop
26 atoms in loop
000L-2413 is ID in loop
76 properties in loop
18 atoms in loop
000L-5632 is ID in loop
0 properties in worker
0 properties in worker
20 atoms in worker
0 properties in worker
18 atoms in worker
18 atoms in worker
18 atoms in worker
0 properties in worker
26 atoms in worker
0 properties in worker

---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
...

Any ideas on where the missing data went, or how to overcome this
issue?  Thanks in advance for your thoughts!

Best
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to