On Sun, 11 Mar 2018 13:10:21 -0400, Gregory Szorc <gregory.sz...@gmail.com> wrote:

On Sun, Mar 11, 2018 at 7:52 AM, Yuya Nishihara <y...@tcha.org> wrote:

On Sun, 11 Mar 2018 08:55:41 -0400, Matt Harbison wrote:
> > On Mar 11, 2018, at 5:43 AM, Yuya Nishihara <y...@tcha.org> wrote:
> >
> >> On Sun, 11 Mar 2018 00:04:45 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbi...@yahoo.com>
> >> # Date 1520744281 18000
> >> #      Sat Mar 10 23:58:01 2018 -0500
> >> # Node ID 1145671119a40e82932f63f83ad4049727416174
> >> # Parent  963b4223d14fa419df2a82fbe47cd55075707b6a
> >> wireproto: explicitly flush stdio to prevent stalls on Windows
> >>
> >> This is the key to fixing the hangs on Windows in D2720[1].  I put
flushes in a
> >> bunch of other places that didn't help, but I suspect that's more a
lack of test
> >> coverage than anything else.
> >>
> >> Chasing down stuff like this is pretty painful. I'm wondering if we
can put a
> >> proxy around sys.stderr (and sys.stdout?) on Windows (only when
> >> that will flush on every write (or at least every write with a '\n').
> >>
> >> [1] https://www.mercurial-scm.org/pipermail/mercurial-devel/
> >>
> >> diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
> >> --- a/mercurial/debugcommands.py
> >> +++ b/mercurial/debugcommands.py
> >> @@ -2769,6 +2769,8 @@ def debugwireproto(ui, repo, **opts):
> >>
> >>     # Now perform actions based on the parsed wire language
> >>     for action, lines in blocks:
> >> +        util.stdout.flush()
> >
> > This one is suspicious. Can you clarify which data is flushed? I think
> > fileobjectobserver should flush log per method call.
> Yeah, this made no sense to me. I would have expected that it needed to be flushed on the server side. But, here’s where it stalls without this:
> https://www.mercurial-scm.org/pipermail/mercurial-devel/

Ok, if the server stalls, that might be because of the --noreadstderr
The client must read stderr so the server can write into the pipe.

That's probably it. We're using unbuffered pipes to the process running the SSH protocol server. So we could very easily get into a deadlock situation
if the client isn't reading from a pipe when the other end writes to it.

The other issue I think is that there's no line buffering on Windows- it's full buffered. The missing data that caused the hangs was actually missing before D2720 too. But the deterministic read and hang made it harder to ignore. :)

Maybe the correct thing to do is have the client always read from stderr
but not print the data unless explicitly instructed to. That would make the code a bit more complicated. But I'm not sure how else to get deterministic
results here, since the very nature of polling two pipes is
non-deterministic. The only way to be deterministic is to perform blocking
reads for deterministic amounts - either readline() or read(n).

I know that (at least on Windows), writing to stdout/stderr can hang if the other side isn't reading the buffer, so it's probably a good idea to tackle at some point. But I don't think we are close to 4K of stderr (I think that's the default size- maybe it's only 1K). The problem here was too little data, so it wasn't flushed.

I'm sorry this test is causing trouble for you, Matt. Hopefully the end
state of having low-level test coverage of the wire protocol is worth it.
FWIW I think the tests have already been useful for finding oddities with
stdio stream handling during hooks and identifying wonky behavior of the
existing wire protocol.

Not a problem, and I agree that the low level tests are better. So hopefully that didn't come off as a criticism of your work, because it wasn't meant to be. I was just thinking out loud that the full buffering on Windows is a PITA, and maybe there's something we can do other than sprinkling flushes all over. The other patch with the hook flushing was something I had to hunt down a year or two ago (I remember wondering if it should go in the finally block, but the tests passed, and I wasn't sure if it would change the order there). And there were several other cases of needing a random flush here or there to get Windows tests aligned with everything else.

Wrapping stderr in a proxy is beyond my python ability, and IDK what the downsides are. So I guess we can just watch and see if any more problems pop up in the short term? But I don't have a lot of confidence that we can find every place that needs to be flushed manually.
Mercurial-devel mailing list

Reply via email to