On February 18, 2019 15:47, I wrote:
> On February 18, 2019 15:41, Johannes Schindelin wrote:
> > On Thu, 14 Feb 2019, Randall S. Becker wrote:
> >
> > > On February 14, 2019 17:39, Junio C Hamano wrote:
> > > > To: Randall S. Becker <rsbec...@nexbridge.com>
> > > > Cc: 'Johannes Schindelin via GitGitGadget'
> > > > <gitgitgad...@gmail.com>; git@vger.kernel.org; 'Max Kirillov'
> > > > <m...@max630.net>
> > > > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in
> > > > v2.21.0-rc1
> > > >
> > > > "Randall S. Becker" <rsbec...@nexbridge.com> writes:
> > > >
> > > > > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > > > > patch, so our Pipeline still hangs. I'm glad it's better on
> > > > > Azure, but I don't think this actually addresses the root cause
> > > > > of the
> hang.
> > > >
> > > > Sigh.
> > > >
> > > > > possible this is not the test that is failing, but actually the
> > > > > git-http-backend? The code is not in a loop, if that helps. It
> > > > > is not consuming any significant cycles. I don't know that part
> > > > > of the code at all, sadly. The code is here:
> > > > >
> > > > > * in the operating system from here up *
> > > > >   cleanup_children + 0x5D0 (UCr)
> > > > >   cleanup_children_on_exit + 0x70 (UCr)
> > > > >   git_atexit_dispatch + 0x200 (UCr)
> > > > >   __process_atexit_functions + 0xA0 (DLL zcredll)
> > > > >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> > > > >   exit + 0x2A0 (DLL zcrtldll)
> > > > >   die_webcgi + 0x240 (UCr)
> > > > >   die_errno + 0x360 (UCr)
> > > > >   write_or_die + 0x1C0 (UCr)
> > > > >   end_headers + 0x1A0 (UCr)
> > > > >   die_webcgi + 0x220 (UCr)
> > > > >   die + 0x320 (UCr)
> > > > >   inflate_request + 0x520 (UCr)
> > > > >   run_service + 0xC20 (UCr)
> > > > >   service_rpc + 0x530 (UCr)
> > > > >   cmd_main + 0xD00 (UCr)
> > > > >   main + 0x190 (UCr)
> > > > >
> > > > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten
> > > > > or neglected somewhere between the test, perl, and git-http-
> backend.
> > > >
> > > > So we are trying to die(), which actually happens in die_webcgi(),
> > > > and
> > > then try
> > > > to write some message _but_ notice an error inside
> > > > write_or_dir() and try to exit because we do not want to recurse
> > > > forever trying to die, giving a message to say how/why we died,
> > > > and die because failing to give that message, forever.
> > > >
> > > > But in our attempt to exit(), we try to "cleanup children" and
> > > > that is
> > > what gets
> > > > stuck.
> > > >
> > > > One big difference before and after the /dev/zero change is that
> > > > the
> > > process
> > > > is now on a downstream of the pipe.  If we prepare a large file
> > > > with a
> > > finite
> > > > size full of NULs and replace /dev/null with it, instead of
> > > > feeding NULs
> > > from
> > > > the pipe, would it change the equation?
> > >
> > > Doubtful. The processes are still around, and are waiting on read
> > > but not actively reading (CPU time is not going up, so we're not
> > > reading an infinite stream). To me, this is a pipe situation where
> > > there is simply nothing waiting on the pipe (maybe a flush
> > > missing?). I'm grasping are straws without knowing the actual
> > > process architecture of
> the
> > test to debug it.
> >
> > So could you try with this patch?
> >
> > -- snipsnap --
> > diff --git a/http-backend.c b/http-backend.c index
> > d5cea0329a..7c1b4a2555
> > 100644
> > --- a/http-backend.c
> > +++ b/http-backend.c
> > @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name,
> > int out, int buffer_input, ss
> >
> >  done:
> >     git_inflate_end(&stream);
> > +   close(0);
> >     close(out);
> >     free(full_request);
> >  }
> 
> In isolation or with the other fixes associated with t5562? Or, which
baseline
> commit should I use? 8989e1950a or d92031209a or some other?

It works fine based off of d92031209a - the test actually runs a bit faster
based on the wall clock.

Reply via email to