Thanks Jiajun,

I was aware that the DMTCP_GDB_ATTACH_ON_RESTART option was not
added until version 2.4, but the documentation does state the
existence of the MTCP_RESTART_PAUSE capability for versions prior
to version 2.4.

Does anyone have any experience using this MTCP_RESTART_PAUSE option
with the earlier compilers?

Not worried about root permissions.

Major problem with using 2.4 was that we need to use the
--no-coordinator option and that option was not working correctly
as of version 2.4.4 when we started using DMTCP.  It was also not
working correctly in version 2.5.0-rc1.

This problem with the --no-coordinator option was acknowledged by
Gene Cooperman to one of my associates as a bug back in February
of 2016.  In short, the problem was that the return value from the
dmtcp_checkpoint() function was always zero (instead of
DMTCP_AFTER_CHECKPOINT) whenever the --no-coordinator option was
used.  We have never heard back from Gene whether this bug has
been fixed.

I have examined the release notes (and NEWS files) from each
subsequent release of version 2.4 (2.4.5, 2.4.6, 2.4.7, 2.4.8)
and have found no mention of this --no-coordinator bug being
fixed.

Can you possibly check with Gene and the other developers and find
out whether this bug has in fact been fixed yet?  and as of what
version it has been fixed?

Once I get confirmation that the --no-coordinator option is working
correctly, I can upgrade to the latest 2.4 version.

In version 2.4, if I use the DMTCP_GDB_ATTACH_ON_RESTART option,
does that pause the restarted process with enough time so that I
can attach right at the point when my program resumes from the
checkpoint?  or do I need to also use one of the DMTCP_RESTART_PAUSE
options before my dmtcp_launch command in order to get the pause
for attaching?

Thanks for taking the time to read this and answer the FOUR
questions I have posed.

Rick

-----------------------------------------------------------------------

On Tue, 8 Aug 2017, Jiajun Cao wrote:

Hi Rick,

The support for allowing gdb attach on restart was not added until
the 2.4 release.

Is there any possibility that you upgrade the installation to a
newer version? Note you don't need to have root privilege to do
that. If you want to test it locally, just compile the source code,
and add the bin path to your $PATH env var.

Best,
Jiajun

On Tue, Aug 08, 2017 at 03:53:02PM -0400, dmtcp-fo...@gusbus.org wrote:

Hi there,

I'm trying to understand the documented options for debugging a
restarted DMTCP process.

I've been testing a few different environment variables trying to
get the dmtcp_restart process to pause for 15 seconds like it is
suppose to, so that I can attach to it with gdb.

There are various options in the documentation, environment variables
like:

DMTCP_RESTART_PAUSE
DMTCP_RESTART_PAUSE2
DMTCP_GDB_ATTACH_ON_RESTART

I am currently locked into using DMTCP version 2.3.1 at the present
time.  My program was compiled using gcc 6.4.0.  I am running under
CentOS release 6.8 Final, kernal 2.6.32-642.4.2.el6.x86_64 #1 SMP.


Have noted that the documentation states that prior to version 2.4
of DMTCP, the environment variable to use is named MTCP_RESTART_PAUSE.

I have tried setting MTCP_RESTART_PAUSE prior to the dmtcp_launch
command, and prior to both dmtcp_launch and dmtcp_restart commands
with no luck.  I am able to use gdb to attach to the process (more
easily from a separate terminal because of the standard output being
produced from my process in the dmtcp_restart process terminal) but
since there is no pause, gdb doesn't attach until later in the process
after the restart, all based on how quickly I can get the gdb attach
commmand typed in and entered.  Not really a reliable way to go :)

In any event, if there are known issues with regard to getting the
restart process to pause in version DMTCP 2.3.1, then that would
explain it, if however I am doing something wrong, then any help
would be appreciated.

My normal series of run commands go like this (just like the DMTCP
documentation)

MTCP_RESTART_PAUSE=1 dmtcp_launch --disable-alloc-plugin --no-coordinator --port 0 
program.exe < input.dat

dmtcp_restart --port 0 --port-file dmtcpportfile checkpoint.dmtcp < input.int &

Do you think any of the command line options are causing any issues, like
my lack of using the coordinator?  Also, my program always reads a file from
standard input even during a restart.

Thanks,
Rick



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to