Hi
I'm attempting to checkpoint a Java application. I'm using "dmtcp_launch
--ckpt-open-files java -jar ..." to start the application and setting the
checkpoint directory to a networked file system.
Everything seems to work as expected, but restarts are failing:
$ /scratch/dmtcp_restart_script.sh
dmtcp_coordinator starting...
Host: ****** (***.***.***.***)
Port: 7779
Checkpoint Interval: disabled (checkpoint manually instead)
Exit on last client: 1
Backgrounding...
[22895] mtcp_restart.c:1248 read_shared_memory_area_from_file:
mapping current version of /app/BEAST/1.8.2/lib/beast.jar into memory;
_not_ file as it existed at time of checkpoint.
(Or this may be a file shared by multiple processes.)
Change mtcp_restart.c:1248 and re-compile, if you want different behavior.
[40000] ERROR at fileconnection.cpp:943 in areFilesEqual;
REASON='JASSERT(Util::readAll(fd, buf2, readBytes) == readBytes) failed'
java (40000): Terminating...
It seems very similar to an issue reported here:
http://sourceforge.net/p/dmtcp/mailman/message/32939254/
I have tried the patch attached in final update to that. Restart still
fails, but the message changes:
$ dmtcp_restart_script.sh
dmtcp_coordinator starting...
Host: ****** (***.***.***.***)
Port: 7779
Checkpoint Interval: disabled (checkpoint manually instead)
Exit on last client: 1
Backgrounding...
[23895] mtcp_restart.c:1248 read_shared_memory_area_from_file:
mapping current version of /app/BEAST/1.8.2/lib/beast.jar into memory;
_not_ file as it existed at time of checkpoint.
(Or this may be a file shared by multiple processes.)
Change mtcp_restart.c:1248 and re-compile, if you want different behavior.
[40000] ERROR at fileconnection.cpp:985 in writeFileFromFd;
REASON='JASSERT(_real_lseek(fd, 0, SEEK_SET) == 0) failed'
fd = -1
(strerror((*__errno_location ()))) = Bad file descriptor
java (40000): Terminating...
Without checkpointing files the restart works fine. Regrettably, I need to
have those files checkpointed. Any advice would be appreciated.
Thanks
Michael
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum