Previous comment was only on the else block; I just noticed the extra apr_pollset_remove() here:
apr_pollset_remove(event_pollset, baton->pfds[i]); } + apr_pollset_remove(event_pollset, out_pfd); push_timer2worker(te); } It shouldn't be necessary either, since out_pfd is included in baton->pfds; it gets removed in the for loop just before. On Fri, Jul 25, 2014 at 1:31 PM, Edward Lu <chaos...@gmail.com> wrote: > Re: Patch; yes, I think that works as well, but I'm still not sure it > makes sense. If two or more sockets are readable, the first one that's > processed will remove itself and the other ones from the pollset; the rest > will try to remove themselves from the pollset, but nothing will happen > (since they've already been removed by the processing of the first one). > > > On Fri, Jul 25, 2014 at 11:50 AM, Ruediger Pluem <rpl...@apache.org> > wrote: > >> Does the following patch work as well? >> >> Index: event.c >> =================================================================== >> --- event.c (revision 1613354) >> +++ event.c (working copy) >> @@ -1973,9 +1973,12 @@ >> apr_pollset_remove(event_pollset, >> baton->pfds[i]); >> } >> >> + apr_pollset_remove(event_pollset, out_pfd); >> push_timer2worker(te); >> } >> - apr_pollset_remove(event_pollset, out_pfd); >> + else { >> + apr_pollset_remove(event_pollset, out_pfd); >> + } >> } >> out_pfd++; >> num--; >> >> Regards >> >> RĂ¼diger >> >> Edward Lu wrote: >> > I think I've found the issue; there's an extraneous >> apr_pollset_remove() call in event.c:1978. Relevant code >> > (double-slash comments are mine): >> > >> > /* We only signal once per N sockets with this baton */ >> > if (!(baton->signaled)) { >> > baton->signaled = 1; >> > te = event_get_timer_event(-1 /* fake timer */, >> > baton->cbfunc, >> > baton->user_baton, >> > 0, /* don't insert it */ >> > NULL /* no associated socket >> callback */); >> > /* remove other sockets in my set */ >> > // says remove "other" sockets, but it really removes all >> > for (i = 0; i < baton->nsock; i++) { >> > rc = apr_pollset_remove(event_pollset, baton->pfds[i]); >> > } >> > >> > push_timer2worker(te); >> > } >> > // This causes the problem; if the sockets are processed and >> re-added >> > // before this is called, the backend socket gets re-removed, >> never to >> > // be seen again (until the client sends something) >> > apr_pollset_remove(event_pollset, out_pfd); >> > >> > Seems like the solution is just to remove the extra >> apr_pollset_remove(); all sockets are removed from the pollset when >> > one of them is signalled. If another one was signalled in the same >> poll, it will enter this block and simply do nothing >> > (because of the baton->signaled check). I gave it a quick test and it >> looked like the problem was solved. >> > >> > >> > On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu <chaos...@gmail.com >> <mailto:chaos...@gmail.com>> wrote: >> > >> > I got it to break with these python scripts, as well as with my own >> test client and server (browser javascript and >> > node.js). Strange though, I can't reproduce it with only 2 clients >> on either of those programs; I have to get 3 >> > clients connected at once. Past that, I see exactly the behavior >> you describe. >> > >> > I agree that it seems to be some sort of timing issue - I've been >> trying to put trace statements into the HTTPD to >> > see what information I can get, but once I do that it starts >> behaving as expected. Still investigating... >> > >> > >> > On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep < >> steve.zw...@watchguard.com <mailto:steve.zw...@watchguard.com>> wrote: >> > >> > Another tidbit of information ... >> > >> > I attempted to see what was going on by running httpd in the >> foreground with strace. When I did that, the tests >> > worked perfectly. I guess this suggests it is some sort of >> timing issue. Running httpd in the foreground without >> > trace exhibited the problem. >> > >> > Here's the output of strace during the test. I sent the string >> "hhhhhhh" from client 1 and it was echoed back by >> > the server to both clients, as expected. >> > >> > [pid 15451] <... futex resumed> ) = 0 >> > [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0 >> > [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752, >> u64=144017752}}}, 2, 0) = 1 >> > [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10 >> > [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0 >> > [pid 15451] read(12, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource >> temporarily unavailable) >> > [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10 >> > [pid 15451] read(12, 0xa96084b0, 8000) = -1 EAGAIN (Resource >> temporarily unavailable) >> > [pid 15451] epoll_wait(13, {}, 2, 0) = 0 >> > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, >> {u32=144044472, u64=144044472}}) = 0 >> > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, >> {u32=144044496, u64=144044496}}) = 0 >> > [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0 >> > [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856, >> u64=2843761856}}}, 2, 0) = 1 >> > [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10 >> > [pid 15451] read(9, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource >> temporarily unavailable) >> > [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10 >> > [pid 15451] read(9, 0xa96084b0, 8000) = -1 EAGAIN (Resource >> temporarily unavailable) >> > [pid 15451] epoll_wait(10, {}, 2, 0) = 0 >> > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, >> {u32=2843796920, u64=2843796920}}) = 0 >> > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, >> {u32=2843796944, u64=2843796944}}) = 0 >> > [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0 >> > [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL >> <unfinished ...> >> > >> > >> > - Steve >> > >> > -----Original Message----- >> > From: Steve Zweep >> > Sent: Monday, July 21, 2014 11:56 AM >> > To: 'dev@httpd.apache.org <mailto:dev@httpd.apache.org>' >> > Subject: RE: Question about async mod_proxy_wstunnel and threads >> > >> > My client and server are based on sample code included with the >> ws4py package >> > (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and >> the ws4py package contains a CherryPy-based >> > server (attached). >> > >> > The client is a variation of the echo_client.py sample code >> also included in the ws4py package. >> > >> > In terms of failure rate, I'd say it is about 90%. Often I >> start 2 clients and initially it works OK, but then >> > stops after a message or two. Once it has failed, I can send >> any number of messages from the one client and >> > continue to see the server echo them back, but no message is >> received by the second. When I then send a message >> > from the second client I see a flood of messages from the >> server that had been queued. >> > >> > - Steve >> > >> > >> > -----Original Message----- >> > From: Eric Covener [mailto:cove...@gmail.com <mailto: >> cove...@gmail.com>] >> > Sent: Saturday, July 19, 2014 8:16 AM >> > To: Apache HTTP Server Development List >> > Subject: Re: Question about async mod_proxy_wstunnel and threads >> > >> > I guess it is kind of telling that we could still respond to >> the write from client 2, there are probably more >> > potential errors that make us lose track of both sides >> simultaneously. >> > >> > Can you share your ws client and server if they're easy to use? >> My little sample doesn't spray that way. >> > >> > I'm also curious approximately how often it fails this way. >> > >> > On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic < >> ylavic....@gmail.com <mailto:ylavic....@gmail.com>> wrote: >> > > Probably a false trail, trunk at r1611741 is up to date, I >> think 1.5.x >> > > won't help. >> > > >> > > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep >> > > <steve.zw...@watchguard.com <mailto: >> steve.zw...@watchguard.com>> wrote: >> > >> Hi Yann, >> > >> >> > >> The test I ran today was built with APR from trunk >> (r1611741). I see that r1605769 modifies apr_skiplist.c >> > and there has been a lot of activity in the trunk version of >> that code lately. I can try with the 1.5.x branch >> > code to see if it makes a difference. Probably won't get to >> this before Monday though. >> > >> >> > >> - Steve >> > >> >> > >> -----Original Message----- >> > >> From: Yann Ylavic [mailto:ylavic....@gmail.com <mailto: >> ylavic....@gmail.com>] >> > >> Sent: Friday, July 18, 2014 4:51 PM >> > >> To: httpd >> > >> Subject: Re: Question about async mod_proxy_wstunnel and >> threads >> > >> >> > >> Hi Steve, >> > >> >> > >> can you still reproduce with the latest APR 1.5.x, notably >> containing this fix: http://svn.apache.org/r1605769. >> > >> I don't think there is a released version with this patch... >> > >> >> > >> Regards, >> > >> Yann. >> > >> >> > >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep < >> steve.zw...@watchguard.com <mailto:steve.zw...@watchguard.com>> >> > wrote: >> > >>> I've attached annotated logs that show the issues I >> described. Both scenarios have ProxyWebsocketAsync >> > turned on. The first does not use the AsyncDelay and shows how >> server messages stall and are not delivered until >> > the client polls. The second has ProxyWebsocketAsyncDelay set >> to 100. In that case, message processing works >> > properly, but threads are held open and there is no sign of >> async processing. >> > >>> >> > >>> Since my build and execution environment were somewhat >> non-standard, I repeated all the tests today on stock >> > Ubuntu 14.04, with a fresh checkout and build of httpd and apr >> trunk code. The same results were observed. >> > >>> >> > >>> - Steve >> > >>> >> > >>> >> > >>> >> > >>> -----Original Message----- >> > >>> ________________________________________ >> > >>> From: Eric Covener [cove...@gmail.com <mailto: >> cove...@gmail.com>] >> > >>> Sent: July 17, 2014 9:15 PM >> > >>> To: Apache HTTP Server Development List >> > >>> Subject: Re: Question about async mod_proxy_wstunnel and >> threads >> > >>> >> > >>> I am having trouble seeing it mis-behave. w/ Async and >> AsyncDelay, I >> > >>> am seeing the expected trace messages and when I look at >> backtraces >> > >>> of httpd I can see zero threads in wstunnel . If I send a >> server >> > >>> msg, I get it ASAP in the client -- and then I see 1 thread >> in poll >> > >>> for the right couple of seconds >> > >>> >> > >>> Can you grab trace at e.g. >> > >>> >> > >>> LogLevel INFO proxy_wstunnel_module:trace8 >> > >>> >> > >>> And annotate the timing a bit for what you do in the >> client? Is it possible you have an un-updated trunk >> > from several weeks ago? There was an optimization put in and >> backed out that might have broke some of these >> > same things for a very short window. >> > >>> >> > >> > >> > >> > -- >> > Eric Covener >> > cove...@gmail.com <mailto:cove...@gmail.com> >> > >> > >> > >> > >