Hi Jiajun,

This is just an FYI.  Unsurprisingly, this patch seems to be interfering
with a proper checkpoint / restart, and I think it is because there is now
a race between the states of the threads at the time of the checkpoint, so
the state of the whole program across the threads is inconsistent.
Basically, the way I understand the script is that it creates a pool of
workers to process a list of file and eventually move them to a new
directory.  After restart at least one thread can't find the file anymore,
because another thread had already moved it to its new place, but
presumably it never got the acknowledgment from the worker thread.
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Dmtcp-forum mailing list

Reply via email to