Does the following patch work as well?
Index: event.c
===================================================================
--- event.c (revision 1613354)
+++ event.c (working copy)
@@ -1973,9 +1973,12 @@
apr_pollset_remove(event_pollset, baton->pfds[i]);
}
+ apr_pollset_remove(event_pollset, out_pfd);
push_timer2worker(te);
}
- apr_pollset_remove(event_pollset, out_pfd);
+ else {
+ apr_pollset_remove(event_pollset, out_pfd);
+ }
}
out_pfd++;
num--;
Regards
RĂ¼diger
Edward Lu wrote:
> I think I've found the issue; there's an extraneous apr_pollset_remove() call
> in event.c:1978. Relevant code
> (double-slash comments are mine):
>
> /* We only signal once per N sockets with this baton */
> if (!(baton->signaled)) {
> baton->signaled = 1;
> te = event_get_timer_event(-1 /* fake timer */,
> baton->cbfunc,
> baton->user_baton,
> 0, /* don't insert it */
> NULL /* no associated socket callback */);
> /* remove other sockets in my set */
> // says remove "other" sockets, but it really removes all
> for (i = 0; i < baton->nsock; i++) {
> rc = apr_pollset_remove(event_pollset, baton->pfds[i]);
> }
>
> push_timer2worker(te);
> }
> // This causes the problem; if the sockets are processed and re-added
> // before this is called, the backend socket gets re-removed, never to
> // be seen again (until the client sends something)
> apr_pollset_remove(event_pollset, out_pfd);
>
> Seems like the solution is just to remove the extra apr_pollset_remove(); all
> sockets are removed from the pollset when
> one of them is signalled. If another one was signalled in the same poll, it
> will enter this block and simply do nothing
> (because of the baton->signaled check). I gave it a quick test and it looked
> like the problem was solved.
>
>
> On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu <chaos...@gmail.com
> <mailto:chaos...@gmail.com>> wrote:
>
> I got it to break with these python scripts, as well as with my own test
> client and server (browser javascript and
> node.js). Strange though, I can't reproduce it with only 2 clients on
> either of those programs; I have to get 3
> clients connected at once. Past that, I see exactly the behavior you
> describe.
>
> I agree that it seems to be some sort of timing issue - I've been trying
> to put trace statements into the HTTPD to
> see what information I can get, but once I do that it starts behaving as
> expected. Still investigating...
>
>
> On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep <steve.zw...@watchguard.com
> <mailto:steve.zw...@watchguard.com>> wrote:
>
> Another tidbit of information ...
>
> I attempted to see what was going on by running httpd in the
> foreground with strace. When I did that, the tests
> worked perfectly. I guess this suggests it is some sort of timing
> issue. Running httpd in the foreground without
> trace exhibited the problem.
>
> Here's the output of strace during the test. I sent the string
> "hhhhhhh" from client 1 and it was echoed back by
> the server to both clients, as expected.
>
> [pid 15451] <... futex resumed> ) = 0
> [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752,
> u64=144017752}}}, 2, 0) = 1
> [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10
> [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0
> [pid 15451] read(12, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource
> temporarily unavailable)
> [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10
> [pid 15451] read(12, 0xa96084b0, 8000) = -1 EAGAIN (Resource
> temporarily unavailable)
> [pid 15451] epoll_wait(13, {}, 2, 0) = 0
> [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=144044472,
> u64=144044472}}) = 0
> [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=144044496,
> u64=144044496}}) = 0
> [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0
> [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856,
> u64=2843761856}}}, 2, 0) = 1
> [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10
> [pid 15451] read(9, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource
> temporarily unavailable)
> [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10
> [pid 15451] read(9, 0xa96084b0, 8000) = -1 EAGAIN (Resource
> temporarily unavailable)
> [pid 15451] epoll_wait(10, {}, 2, 0) = 0
> [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2843796920,
> u64=2843796920}}) = 0
> [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=2843796944,
> u64=2843796944}}) = 0
> [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0
> [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL <unfinished
> ...>
>
>
> - Steve
>
> -----Original Message-----
> From: Steve Zweep
> Sent: Monday, July 21, 2014 11:56 AM
> To: 'dev@httpd.apache.org <mailto:dev@httpd.apache.org>'
> Subject: RE: Question about async mod_proxy_wstunnel and threads
>
> My client and server are based on sample code included with the ws4py
> package
> (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and the
> ws4py package contains a CherryPy-based
> server (attached).
>
> The client is a variation of the echo_client.py sample code also
> included in the ws4py package.
>
> In terms of failure rate, I'd say it is about 90%. Often I start 2
> clients and initially it works OK, but then
> stops after a message or two. Once it has failed, I can send any
> number of messages from the one client and
> continue to see the server echo them back, but no message is received
> by the second. When I then send a message
> from the second client I see a flood of messages from the server that
> had been queued.
>
> - Steve
>
>
> -----Original Message-----
> From: Eric Covener [mailto:cove...@gmail.com
> <mailto:cove...@gmail.com>]
> Sent: Saturday, July 19, 2014 8:16 AM
> To: Apache HTTP Server Development List
> Subject: Re: Question about async mod_proxy_wstunnel and threads
>
> I guess it is kind of telling that we could still respond to the
> write from client 2, there are probably more
> potential errors that make us lose track of both sides simultaneously.
>
> Can you share your ws client and server if they're easy to use? My
> little sample doesn't spray that way.
>
> I'm also curious approximately how often it fails this way.
>
> On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic <ylavic....@gmail.com
> <mailto:ylavic....@gmail.com>> wrote:
> > Probably a false trail, trunk at r1611741 is up to date, I think
> 1.5.x
> > won't help.
> >
> > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep
> > <steve.zw...@watchguard.com <mailto:steve.zw...@watchguard.com>>
> wrote:
> >> Hi Yann,
> >>
> >> The test I ran today was built with APR from trunk (r1611741). I
> see that r1605769 modifies apr_skiplist.c
> and there has been a lot of activity in the trunk version of that
> code lately. I can try with the 1.5.x branch
> code to see if it makes a difference. Probably won't get to this
> before Monday though.
> >>
> >> - Steve
> >>
> >> -----Original Message-----
> >> From: Yann Ylavic [mailto:ylavic....@gmail.com
> <mailto:ylavic....@gmail.com>]
> >> Sent: Friday, July 18, 2014 4:51 PM
> >> To: httpd
> >> Subject: Re: Question about async mod_proxy_wstunnel and threads
> >>
> >> Hi Steve,
> >>
> >> can you still reproduce with the latest APR 1.5.x, notably
> containing this fix: http://svn.apache.org/r1605769.
> >> I don't think there is a released version with this patch...
> >>
> >> Regards,
> >> Yann.
> >>
> >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep
> <steve.zw...@watchguard.com <mailto:steve.zw...@watchguard.com>>
> wrote:
> >>> I've attached annotated logs that show the issues I described.
> Both scenarios have ProxyWebsocketAsync
> turned on. The first does not use the AsyncDelay and shows how server
> messages stall and are not delivered until
> the client polls. The second has ProxyWebsocketAsyncDelay set to 100.
> In that case, message processing works
> properly, but threads are held open and there is no sign of async
> processing.
> >>>
> >>> Since my build and execution environment were somewhat
> non-standard, I repeated all the tests today on stock
> Ubuntu 14.04, with a fresh checkout and build of httpd and apr trunk
> code. The same results were observed.
> >>>
> >>> - Steve
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> ________________________________________
> >>> From: Eric Covener [cove...@gmail.com <mailto:cove...@gmail.com>]
> >>> Sent: July 17, 2014 9:15 PM
> >>> To: Apache HTTP Server Development List
> >>> Subject: Re: Question about async mod_proxy_wstunnel and threads
> >>>
> >>> I am having trouble seeing it mis-behave. w/ Async and
> AsyncDelay, I
> >>> am seeing the expected trace messages and when I look at
> backtraces
> >>> of httpd I can see zero threads in wstunnel . If I send a server
> >>> msg, I get it ASAP in the client -- and then I see 1 thread in
> poll
> >>> for the right couple of seconds
> >>>
> >>> Can you grab trace at e.g.
> >>>
> >>> LogLevel INFO proxy_wstunnel_module:trace8
> >>>
> >>> And annotate the timing a bit for what you do in the client? Is
> it possible you have an un-updated trunk
> from several weeks ago? There was an optimization put in and backed
> out that might have broke some of these
> same things for a very short window.
> >>>
>
>
>
> --
> Eric Covener
> cove...@gmail.com <mailto:cove...@gmail.com>
>
>
>