Sorry for the email, but going through the Debug output I came across the
following difference which I figure is relevant:

I am not sure what is causing it yet.

Bold points mark divergence in output that actually helps make sense a bit
of why the ssh goes defunct.  More to look into of course...
######################################
### WITH --MODIFY-ENV PLUGIN #####
######################################

[45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
user function'
     virtualTid = 45020
[45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting thread'
     th->tid = 118062
     th->virtual_tid = 45020
[45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling user
function'
     dmtcp_gettid() = 45020
*[46000] TRACE at coordinatorapi.cpp:70 in dmtcp_CoordinatorAPI_EventHook;
REASON='exit() in progress, disconnecting from dmtcp coordinator'*
[46000] TRACE at threadlist.cpp:250 in killCkpthread; REASON='Kill
checkpinthread'
     ckptThread->tid = 117995
[46000] TRACE at dmtcpworker.cpp:333 in cleanupWorker;
REASON='disconnecting from dmtcp coordinator'
[40000] TRACE at socketwrappers.cpp:59 in socket; REASON='socket created'
     ret = 13
     domain = 2
     type = 1
     protocol = 0
[40000] TRACE at socketconnection.cpp:196 in TcpConnection;
REASON='Creating TcpConnection.'
     id() = 1c81169a0ee7cfa2-40000-57363f26(99543)
     domain = 2
     type = 1
     protocol = 0

######################################
### WITHOUT --MODIFY-ENV PLUGIN ######
######################################

[45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
user function'
     virtualTid = 45020
[45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting thread'
     th->tid = 117444
     th->virtual_tid = 45020
[45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling user
function'
     dmtcp_gettid() = 45020
*[46000] TRACE at socketwrappers.cpp:152 in process_accept;
REASON='accepted incoming connection'*
*     sockfd = 9*
*     con->id() = 1c81169a0ee7cfa2-46000-57363ea3(99548)*
[45000] TRACE at socketwrappers.cpp:152 in process_accept; REASON='accepted
incoming connection'
     sockfd = 19
     con->id() = 1c81169a0ee7cfa2-45000-57363ea3(99540)
[45000] TRACE at pid_miscwrappers.cpp:149 in __clone; REASON='Calling
libc:__clone'
[45000] TRACE at pid_miscwrappers.cpp:158 in __clone; REASON='New thread
created'
     tid = 117492
[45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
user function'
     virtualTid = 45022
[45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting thread'
     th->tid = 117492
     th->virtual_tid = 45022

On Fri, May 13, 2016 at 11:25 AM, William Fox <w...@lbl.gov> wrote:

> Sure no worries on the issue submission or format taken.
>
> The 2.5 branch as mentioned works in serial for modifyenv but not
> distributed.  Thanks for the pointers and I'm looking forward to a modify
> env that is compatible with distributed mode.  I'll let you know if I
> manage to find a fix myself.   Thanks again.
> On May 13, 2016 11:21 AM, "Gene Cooperman" <g...@ccs.neu.edu> wrote:
>
> Hi William.  So,
>      git clone --branch=2.5 https://github.com/dmtcp/dmtcp.git
> will probably work for you.  We'll set up a 2.5.0-rc2 for future users.
>
> For the additional note, below, do you mind if I copy your report
> from the latest e-mail into a github issue?  By default, I'll give
> credit to you as "William Fox" (no e-mail address).  But I can do that part
> however you wish.
>
> We're between spring and summer semester here, but we hope to look
> at this seriously in the summer.
>
> Thanks,
> - Gene
>
> On Thu, May 12, 2016 at 05:24:20PM -0700, William Fox wrote:
> > CentOS version :
> > LSB Version:
> >
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> > Distributor ID: CentOS
> > Description: CentOS release 6.4 (Final)
> > Release: 6.4
> > Codename: Final
> >
> > also: All the above was correct, I discovered the missing warning.cpp in
> > the make files but was not sure how to fix it.
> >
> > An additional note, which may  be a known compatibility error.
> >
> > with the following installed
> >   git clone --branch=2.5 https://github.com/dmtcp/dmtcp.git
> > the expected behavior occurs for --ib in distributed mode and
> --modify-env
> > in serial but, using --ib and --modify-env or just --modify-env in a
> > distributed environment causes [dmtcp_ssh]<defunct>  when workers are
> > spawned from the master through TCP ssh calls.
> >
> > Without --modify-env it performs correctly.  Using --enable-debug on
> > compile gives:
> > [49000] TRACE at dmtcpworker.cpp:453 in waitForStage1Suspend;
> > REASON='running'
> > [49000] TRACE at dmtcpworker.cpp:412 in waitForCoordinatorMsg;
> > REASON='waiting for SUSPEND message'
> > [46000] TRACE at coordinatorapi.cpp:70 in dmtcp_CoordinatorAPI_EventHook;
> > REASON='exit() in progress, disconnecting from dmtcp coordinator'
> > [46000] TRACE at threadlist.cpp:250 in killCkpthread; REASON='Kill
> > checkpinthread'
> >      ckptThread->tid = 112446
> > [46000] TRACE at dmtcpworker.cpp:333 in cleanupWorker;
> > REASON='disconnecting from dmtcp coordinator'
> > [47000] TRACE at coordinatorapi.cpp:70 in dmtcp_CoordinatorAPI_EventHook;
> > REASON='exit() in progress, disconnecting from dmtcp coordinator'
> > [47000] TRACE at threadlist.cpp:250 in killCkpthread; REASON='Kill
> > checkpinthread'
> >      ckptThread->tid = 112449
> > [47000] TRACE at dmtcpworker.cpp:333 in cleanupWorker;
> > REASON='disconnecting from dmtcp coordinator'
> > [40000] TRACE at socketwrappers.cpp:59 in socket; REASON='socket created'
> >      ret = 13
> >      domain = 2
> >      type = 1
> >      protocol = 0
> > [40000] TRACE at socketconnection.cpp:196 in TcpConnection;
> > REASON='Creating TcpConnection.'
> >      id() = 3d1ce25df201d031-40000-57351cc4(99546)
> >      domain = 2
> >      type = 1
> >      protocol = 0
> > [40000] TRACE at socketwrappers.cpp:89 in connect; REASON='connected'
> >      sockfd = 13
> >      con->id() = 3d1ce25df201d031-40000-57351cc4(99546)
> > [45000] TRACE at socketwrappers.cpp:152 in process_accept;
> REASON='accepted
> > incoming connection'
> >      sockfd = 19
> >      con->id() = 3d1ce25df201d031-45000-57351d2e(99537)
> > [45000] TRACE at pid_miscwrappers.cpp:149 in __clone; REASON='Calling
> > libc:__clone'
> > [45000] TRACE at pid_miscwrappers.cpp:158 in __clone; REASON='New thread
> > created'
> >      tid = 112538
> > [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
> > user function'
> >      virtualTid = 45018
> > [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting
> thread'
> >      th->tid = 112538
> >      th->virtual_tid = 45018
> > [45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling
> user
> > function'
> >      dmtcp_gettid() = 45018
> > [45000] TRACE at pid_miscwrappers.cpp:149 in __clone; REASON='Calling
> > libc:__clone'
> > [45000] TRACE at pid_miscwrappers.cpp:158 in __clone; REASON='New thread
> > created'
> >      tid = 112539
> > [45000] TRACE at pid_miscwrappers.cpp:109 in clone_start; REASON='Calling
> > user function'
> >      virtualTid = 45020
> > [45000] TRACE at threadlist.cpp:237 in updateTid; REASON='starting
> thread'
> >      th->tid = 112539
> >      th->virtual_tid = 45020
> > [45000] TRACE at threadwrappers.cpp:67 in clone_start; REASON='Calling
> user
> > function'
> >      dmtcp_gettid() = 45020
> >
> >
> > On Thu, May 12, 2016 at 3:54 PM, Gene Cooperman <g...@ccs.neu.edu>
> wrote:
> >
> > > William,
> > >     Could you try out the latest development branch:
> > >   git clone --branch=2.5 https://github.com/dmtcp/dmtcp.git
> > > This will give you the commit 00f5f74 that Rohan refers to.
> > > Please let us know if this works for you.
> > >
> > > Kapil,
> > >     I think the conclusion here is that it's time to release:
> > >   dmtcp-2.5.0-rc2
> > > Could you do that when you have a chance?
> > >
> > > Thanks,
> > > - Gene
> > >
> > > On Thu, May 12, 2016 at 06:46:56PM -0400, Rohan Garg wrote:
> > > > Gene and William,
> > > >
> > > > I'm not sure if it's the same bug that we fixed in commit 00f5f74. If
> > > > it is then here's the story behind it:
> > > >
> > > > The bug was introduced in commit: 3a53d4f, where we added some
> warning
> > > > messages using a new `warning()` API in the modify-env plugin. The
> > > warning
> > > > API was defined in a new file `warning.cpp`, but we failed to add the
> > > > new file to our Makefile. This was later fixed in the commit: 00f5f74
> > > > around late Feb (see: https://github.com/dmtcp/dmtcp/commit/00f5f74
> ),
> > > > but it didn't make it to 2.5.0-rc1, which was release earlier in Feb.
> > > >
> > > > Thanks,
> > > > Rohan
> > >
> >
> >
> >
> > --
> > William Fox
> >
> > Lawrence Berkeley National Laboratory
> > Computational Research Division
>
>


-- 
William Fox

Lawrence Berkeley National Laboratory
Computational Research Division
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to