Hi Michael,

Same as previous one, if you could suggest a way for me to reproduce the
bug locally, it would be much easier to come up with a fix :-).

Best,
Kapil

On Wed, Jul 8, 2015 at 8:40 PM, Michael Gutteridge <
michael.gutteri...@gmail.com> wrote:

> Hi
>
> I'm attempting to checkpoint a Java application.  I'm using "dmtcp_launch
> --ckpt-open-files java -jar ..." to start the application and setting the
> checkpoint directory to a networked file system.
>
> Everything seems to work as expected, but restarts are failing:
>
> $ /scratch/dmtcp_restart_script.sh
> dmtcp_coordinator starting...
>     Host: ****** (***.***.***.***)
>     Port: 7779
>     Checkpoint Interval: disabled (checkpoint manually instead)
>     Exit on last client: 1
> Backgrounding...
> [22895] mtcp_restart.c:1248 read_shared_memory_area_from_file:
>   mapping current version of /app/BEAST/1.8.2/lib/beast.jar into memory;
>   _not_ file as it existed at time of checkpoint.
>   (Or this may be a file shared by multiple processes.)
>   Change mtcp_restart.c:1248 and re-compile, if you want different
> behavior.
> [40000] ERROR at fileconnection.cpp:943 in areFilesEqual;
> REASON='JASSERT(Util::readAll(fd, buf2, readBytes) == readBytes) failed'
> java (40000): Terminating...
>
> It seems very similar to an issue reported here:
>
> http://sourceforge.net/p/dmtcp/mailman/message/32939254/
>
> I have tried the patch attached in final update to that.  Restart still
> fails, but the message changes:
>
> $ dmtcp_restart_script.sh
> dmtcp_coordinator starting...
>     Host: ****** (***.***.***.***)
>     Port: 7779
>     Checkpoint Interval: disabled (checkpoint manually instead)
>     Exit on last client: 1
> Backgrounding...
> [23895] mtcp_restart.c:1248 read_shared_memory_area_from_file:
>   mapping current version of /app/BEAST/1.8.2/lib/beast.jar into memory;
>   _not_ file as it existed at time of checkpoint.
>   (Or this may be a file shared by multiple processes.)
>   Change mtcp_restart.c:1248 and re-compile, if you want different
> behavior.
> [40000] ERROR at fileconnection.cpp:985 in writeFileFromFd;
> REASON='JASSERT(_real_lseek(fd, 0, SEEK_SET) == 0) failed'
>      fd = -1
>      (strerror((*__errno_location ()))) = Bad file descriptor
> java (40000): Terminating...
>
>
> Without checkpointing files the restart works fine.  Regrettably, I need
> to have those files checkpointed.  Any advice would be appreciated.
>
> Thanks
>
> Michael
>
>
>
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to