[Dmtcp-forum] memory overhead at checkpoint

Vakho Tsulaia Thu, 07 Dec 2017 12:39:44 -0800

Hello,

We are using DMTCP for checkpointing detector simulation program of theATLAS experiment at CERN.Before checkpoint gets triggered the RSS of our application is ~1.4GB.When we trigger the checkpoint,it takes dmtcp few seconds to create checkpoint image on the disk.During this time the RSS goes upfrom 1.4GB to 1.8GB. When we restart the application, it continues with1.8GB, which means therestarted application uses 400MB more than what it would have usedwithout checkpoint-restart.

Shortly after restart the application forks several sub-processes. Itturns out that this extra 400MBdoes not get shared between sub-processes (which otherwise share memorypages thanks to LinuxCopy-on-Write) and as a result we get NProcesses*400MB memory overheadfor the entire system.

Has anyone experienced similar problems? Is there anything we can doabout it?


I'm happy to provide more details about our application, if necessary.

Thank you,
-- vakho


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

[Dmtcp-forum] memory overhead at checkpoint

Reply via email to