Hi William,

It turns out a typo in our code. Could you make the following one-line
change and try again?

diff --git a/src/util_exec.cpp b/src/util_exec.cpp
index e54f014..768d6e6 100644
--- a/src/util_exec.cpp
+++ b/src/util_exec.cpp
@@ -678,7 +678,7 @@ void Util::getDmtcpArgs(vector<string> &dmtcp_args)
   }

   if (dmtcp_modify_env_enabled != NULL && dmtcp_modify_env_enabled()) {
-    dmtcp_args.push_back("--modify_env");
+   dmtcp_args.push_back("--modify-env");
   }

   if (dmtcp_infiniband_enabled != NULL && dmtcp_infiniband_enabled()) {


Best,
Jiajun

On Fri, May 13, 2016 at 5:37 PM, William Fox <w...@lbl.gov> wrote:

> Sorry for the email, but going through the Debug output I came across the
> following difference which I figure is relevant:
>
> I am not sure what is causing it yet.
>
> Bold points mark divergence in output that actually helps make sense a bit
> of why the ssh goes defunct.  More to look into of course...
> ######################################
> ### WITH --MODIFY-ENV PLUGIN #####
> ######################################
>
> [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
> user function'
>      virtualTid = 45020
> [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting thread'
>      th->tid = 118062
>      th->virtual_tid = 45020
> [45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling
> user function'
>      dmtcp_gettid() = 45020
> *[46000] TRACE at coordinatorapi.cpp:70 in dmtcp_CoordinatorAPI_EventHook;
> REASON='exit() in progress, disconnecting from dmtcp coordinator'*
> [46000] TRACE at threadlist.cpp:250 in killCkpthread; REASON='Kill
> checkpinthread'
>      ckptThread->tid = 117995
> [46000] TRACE at dmtcpworker.cpp:333 in cleanupWorker;
> REASON='disconnecting from dmtcp coordinator'
> [40000] TRACE at socketwrappers.cpp:59 in socket; REASON='socket created'
>      ret = 13
>      domain = 2
>      type = 1
>      protocol = 0
> [40000] TRACE at socketconnection.cpp:196 in TcpConnection;
> REASON='Creating TcpConnection.'
>      id() = 1c81169a0ee7cfa2-40000-57363f26(99543)
>      domain = 2
>      type = 1
>      protocol = 0
>
> ######################################
> ### WITHOUT --MODIFY-ENV PLUGIN ######
> ######################################
>
> [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
> user function'
>      virtualTid = 45020
> [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting thread'
>      th->tid = 117444
>      th->virtual_tid = 45020
> [45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling
> user function'
>      dmtcp_gettid() = 45020
> *[46000] TRACE at socketwrappers.cpp:152 in process_accept;
> REASON='accepted incoming connection'*
> *     sockfd = 9*
> *     con->id() = 1c81169a0ee7cfa2-46000-57363ea3(99548)*
> [45000] TRACE at socketwrappers.cpp:152 in process_accept;
> REASON='accepted incoming connection'
>      sockfd = 19
>      con->id() = 1c81169a0ee7cfa2-45000-57363ea3(99540)
> [45000] TRACE at pid_miscwrappers.cpp:149 in __clone; REASON='Calling
> libc:__clone'
> [45000] TRACE at pid_miscwrappers.cpp:158 in __clone; REASON='New thread
> created'
>      tid = 117492
> [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
> user function'
>      virtualTid = 45022
> [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting thread'
>      th->tid = 117492
>      th->virtual_tid = 45022
>
> On Fri, May 13, 2016 at 11:25 AM, William Fox <w...@lbl.gov> wrote:
>
>> Sure no worries on the issue submission or format taken.
>>
>> The 2.5 branch as mentioned works in serial for modifyenv but not
>> distributed.  Thanks for the pointers and I'm looking forward to a modify
>> env that is compatible with distributed mode.  I'll let you know if I
>> manage to find a fix myself.   Thanks again.
>> On May 13, 2016 11:21 AM, "Gene Cooperman" <g...@ccs.neu.edu> wrote:
>>
>> Hi William.  So,
>>      git clone --branch=2.5 https://github.com/dmtcp/dmtcp.git
>> will probably work for you.  We'll set up a 2.5.0-rc2 for future users.
>>
>> For the additional note, below, do you mind if I copy your report
>> from the latest e-mail into a github issue?  By default, I'll give
>> credit to you as "William Fox" (no e-mail address).  But I can do that
>> part
>> however you wish.
>>
>> We're between spring and summer semester here, but we hope to look
>> at this seriously in the summer.
>>
>> Thanks,
>> - Gene
>>
>> On Thu, May 12, 2016 at 05:24:20PM -0700, William Fox wrote:
>> > CentOS version :
>> > LSB Version:
>> >
>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>> > Distributor ID: CentOS
>> > Description: CentOS release 6.4 (Final)
>> > Release: 6.4
>> > Codename: Final
>> >
>> > also: All the above was correct, I discovered the missing warning.cpp in
>> > the make files but was not sure how to fix it.
>> >
>> > An additional note, which may  be a known compatibility error.
>> >
>> > with the following installed
>> >   git clone --branch=2.5 https://github.com/dmtcp/dmtcp.git
>> > the expected behavior occurs for --ib in distributed mode and
>> --modify-env
>> > in serial but, using --ib and --modify-env or just --modify-env in a
>> > distributed environment causes [dmtcp_ssh]<defunct>  when workers are
>> > spawned from the master through TCP ssh calls.
>> >
>> > Without --modify-env it performs correctly.  Using --enable-debug on
>> > compile gives:
>> > [49000] TRACE at dmtcpworker.cpp:453 in waitForStage1Suspend;
>> > REASON='running'
>> > [49000] TRACE at dmtcpworker.cpp:412 in waitForCoordinatorMsg;
>> > REASON='waiting for SUSPEND message'
>> > [46000] TRACE at coordinatorapi.cpp:70 in
>> dmtcp_CoordinatorAPI_EventHook;
>> > REASON='exit() in progress, disconnecting from dmtcp coordinator'
>> > [46000] TRACE at threadlist.cpp:250 in killCkpthread; REASON='Kill
>> > checkpinthread'
>> >      ckptThread->tid = 112446
>> > [46000] TRACE at dmtcpworker.cpp:333 in cleanupWorker;
>> > REASON='disconnecting from dmtcp coordinator'
>> > [47000] TRACE at coordinatorapi.cpp:70 in
>> dmtcp_CoordinatorAPI_EventHook;
>> > REASON='exit() in progress, disconnecting from dmtcp coordinator'
>> > [47000] TRACE at threadlist.cpp:250 in killCkpthread; REASON='Kill
>> > checkpinthread'
>> >      ckptThread->tid = 112449
>> > [47000] TRACE at dmtcpworker.cpp:333 in cleanupWorker;
>> > REASON='disconnecting from dmtcp coordinator'
>> > [40000] TRACE at socketwrappers.cpp:59 in socket; REASON='socket
>> created'
>> >      ret = 13
>> >      domain = 2
>> >      type = 1
>> >      protocol = 0
>> > [40000] TRACE at socketconnection.cpp:196 in TcpConnection;
>> > REASON='Creating TcpConnection.'
>> >      id() = 3d1ce25df201d031-40000-57351cc4(99546)
>> >      domain = 2
>> >      type = 1
>> >      protocol = 0
>> > [40000] TRACE at socketwrappers.cpp:89 in connect; REASON='connected'
>> >      sockfd = 13
>> >      con->id() = 3d1ce25df201d031-40000-57351cc4(99546)
>> > [45000] TRACE at socketwrappers.cpp:152 in process_accept;
>> REASON='accepted
>> > incoming connection'
>> >      sockfd = 19
>> >      con->id() = 3d1ce25df201d031-45000-57351d2e(99537)
>> > [45000] TRACE at pid_miscwrappers.cpp:149 in __clone; REASON='Calling
>> > libc:__clone'
>> > [45000] TRACE at pid_miscwrappers.cpp:158 in __clone; REASON='New thread
>> > created'
>> >      tid = 112538
>> > [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start;
>> REASON='Calling
>> > user function'
>> >      virtualTid = 45018
>> > [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting
>> thread'
>> >      th->tid = 112538
>> >      th->virtual_tid = 45018
>> > [45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling
>> user
>> > function'
>> >      dmtcp_gettid() = 45018
>> > [45000] TRACE at pid_miscwrappers.cpp:149 in __clone; REASON='Calling
>> > libc:__clone'
>> > [45000] TRACE at pid_miscwrappers.cpp:158 in __clone; REASON='New thread
>> > created'
>> >      tid = 112539
>> > [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start;
>> REASON='Calling
>> > user function'
>> >      virtualTid = 45020
>> > [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting
>> thread'
>> >      th->tid = 112539
>> >      th->virtual_tid = 45020
>> > [45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling
>> user
>> > function'
>> >      dmtcp_gettid() = 45020
>> >
>> >
>> > On Thu, May 12, 2016 at 3:54 PM, Gene Cooperman <g...@ccs.neu.edu>
>> wrote:
>> >
>> > > William,
>> > >     Could you try out the latest development branch:
>> > >   git clone --branch=2.5 https://github.com/dmtcp/dmtcp.git
>> > > This will give you the commit 00f5f74 that Rohan refers to.
>> > > Please let us know if this works for you.
>> > >
>> > > Kapil,
>> > >     I think the conclusion here is that it's time to release:
>> > >   dmtcp-2.5.0-rc2
>> > > Could you do that when you have a chance?
>> > >
>> > > Thanks,
>> > > - Gene
>> > >
>> > > On Thu, May 12, 2016 at 06:46:56PM -0400, Rohan Garg wrote:
>> > > > Gene and William,
>> > > >
>> > > > I'm not sure if it's the same bug that we fixed in commit 00f5f74.
>> If
>> > > > it is then here's the story behind it:
>> > > >
>> > > > The bug was introduced in commit: 3a53d4f, where we added some
>> warning
>> > > > messages using a new `warning()` API in the modify-env plugin. The
>> > > warning
>> > > > API was defined in a new file `warning.cpp`, but we failed to add
>> the
>> > > > new file to our Makefile. This was later fixed in the commit:
>> 00f5f74
>> > > > around late Feb (see: https://github.com/dmtcp/dmtcp/commit/00f5f74
>> ),
>> > > > but it didn't make it to 2.5.0-rc1, which was release earlier in
>> Feb.
>> > > >
>> > > > Thanks,
>> > > > Rohan
>> > >
>> >
>> >
>> >
>> > --
>> > William Fox
>> >
>> > Lawrence Berkeley National Laboratory
>> > Computational Research Division
>>
>>
>
>
> --
> William Fox
>
> Lawrence Berkeley National Laboratory
> Computational Research Division
>
>
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data
> untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to