I got it to break with these python scripts, as well as with my own test client and server (browser javascript and node.js). Strange though, I can't reproduce it with only 2 clients on either of those programs; I have to get 3 clients connected at once. Past that, I see exactly the behavior you describe.
I agree that it seems to be some sort of timing issue - I've been trying to put trace statements into the HTTPD to see what information I can get, but once I do that it starts behaving as expected. Still investigating... On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep <[email protected]> wrote: > Another tidbit of information ... > > I attempted to see what was going on by running httpd in the foreground > with strace. When I did that, the tests worked perfectly. I guess this > suggests it is some sort of timing issue. Running httpd in the foreground > without trace exhibited the problem. > > Here's the output of strace during the test. I sent the string "hhhhhhh" > from client 1 and it was echoed back by the server to both clients, as > expected. > > [pid 15451] <... futex resumed> ) = 0 > [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752, u64=144017752}}}, 2, > 0) = 1 > [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10 > [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0 > [pid 15451] read(12, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource temporarily > unavailable) > [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10 > [pid 15451] read(12, 0xa96084b0, 8000) = -1 EAGAIN (Resource temporarily > unavailable) > [pid 15451] epoll_wait(13, {}, 2, 0) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=144044472, > u64=144044472}}) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=144044496, > u64=144044496}}) = 0 > [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0 > [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856, u64=2843761856}}}, > 2, 0) = 1 > [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10 > [pid 15451] read(9, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource temporarily > unavailable) > [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10 > [pid 15451] read(9, 0xa96084b0, 8000) = -1 EAGAIN (Resource temporarily > unavailable) > [pid 15451] epoll_wait(10, {}, 2, 0) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2843796920, > u64=2843796920}}) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=2843796944, > u64=2843796944}}) = 0 > [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0 > [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL <unfinished ...> > > > - Steve > > -----Original Message----- > From: Steve Zweep > Sent: Monday, July 21, 2014 11:56 AM > To: '[email protected]' > Subject: RE: Question about async mod_proxy_wstunnel and threads > > My client and server are based on sample code included with the ws4py > package (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and > the ws4py package contains a CherryPy-based server (attached). > > The client is a variation of the echo_client.py sample code also included > in the ws4py package. > > In terms of failure rate, I'd say it is about 90%. Often I start 2 clients > and initially it works OK, but then stops after a message or two. Once it > has failed, I can send any number of messages from the one client and > continue to see the server echo them back, but no message is received by > the second. When I then send a message from the second client I see a flood > of messages from the server that had been queued. > > - Steve > > > -----Original Message----- > From: Eric Covener [mailto:[email protected]] > Sent: Saturday, July 19, 2014 8:16 AM > To: Apache HTTP Server Development List > Subject: Re: Question about async mod_proxy_wstunnel and threads > > I guess it is kind of telling that we could still respond to the write > from client 2, there are probably more potential errors that make us lose > track of both sides simultaneously. > > Can you share your ws client and server if they're easy to use? My little > sample doesn't spray that way. > > I'm also curious approximately how often it fails this way. > > On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic <[email protected]> wrote: > > Probably a false trail, trunk at r1611741 is up to date, I think 1.5.x > > won't help. > > > > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep > > <[email protected]> wrote: > >> Hi Yann, > >> > >> The test I ran today was built with APR from trunk (r1611741). I see > that r1605769 modifies apr_skiplist.c and there has been a lot of activity > in the trunk version of that code lately. I can try with the 1.5.x branch > code to see if it makes a difference. Probably won't get to this before > Monday though. > >> > >> - Steve > >> > >> -----Original Message----- > >> From: Yann Ylavic [mailto:[email protected]] > >> Sent: Friday, July 18, 2014 4:51 PM > >> To: httpd > >> Subject: Re: Question about async mod_proxy_wstunnel and threads > >> > >> Hi Steve, > >> > >> can you still reproduce with the latest APR 1.5.x, notably containing > this fix: http://svn.apache.org/r1605769. > >> I don't think there is a released version with this patch... > >> > >> Regards, > >> Yann. > >> > >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep < > [email protected]> wrote: > >>> I've attached annotated logs that show the issues I described. Both > scenarios have ProxyWebsocketAsync turned on. The first does not use the > AsyncDelay and shows how server messages stall and are not delivered until > the client polls. The second has ProxyWebsocketAsyncDelay set to 100. In > that case, message processing works properly, but threads are held open and > there is no sign of async processing. > >>> > >>> Since my build and execution environment were somewhat non-standard, I > repeated all the tests today on stock Ubuntu 14.04, with a fresh checkout > and build of httpd and apr trunk code. The same results were observed. > >>> > >>> - Steve > >>> > >>> > >>> > >>> -----Original Message----- > >>> ________________________________________ > >>> From: Eric Covener [[email protected]] > >>> Sent: July 17, 2014 9:15 PM > >>> To: Apache HTTP Server Development List > >>> Subject: Re: Question about async mod_proxy_wstunnel and threads > >>> > >>> I am having trouble seeing it mis-behave. w/ Async and AsyncDelay, I > >>> am seeing the expected trace messages and when I look at backtraces > >>> of httpd I can see zero threads in wstunnel . If I send a server > >>> msg, I get it ASAP in the client -- and then I see 1 thread in poll > >>> for the right couple of seconds > >>> > >>> Can you grab trace at e.g. > >>> > >>> LogLevel INFO proxy_wstunnel_module:trace8 > >>> > >>> And annotate the timing a bit for what you do in the client? Is it > possible you have an un-updated trunk from several weeks ago? There was an > optimization put in and backed out that might have broke some of these same > things for a very short window. > >>> > > > > -- > Eric Covener > [email protected] >
