RE: Question about async mod_proxy_wstunnel and threads

Steve Zweep Fri, 25 Jul 2014 09:01:44 -0700

Cool. I will test this out.

One other issue I found was with the setting of ProxyWebsocketAsyncDelay. The 
docs say that this value is in milliseconds, but the code actually parses this 
as seconds. While testing, what I thought was a 100 ms delay was actually 100 
seconds and I wasn’t waiting long enough to see the connections suspend. The 
code that parses this is here:


Index: mod_proxy_wstunnel.c
===================================================================
--- mod_proxy_wstunnel.c        (revision 1611738)
+++ mod_proxy_wstunnel.c        (working copy)
@@ -548,7 +548,7 @@
static const char * proxyws_set_aysnch_delay(cmd_parms *cmd, void *conf, const 
char *val)
{
     proxyws_dir_conf *dconf = conf;
-    if (ap_timeout_parameter_parse(val, &(dconf->async_delay), "s") != 
APR_SUCCESS)
+    if (ap_timeout_parameter_parse(val, &(dconf->async_delay), "ms") != 
APR_SUCCESS)
         return "ProxyWebsocketAsyncDelay timeout has wrong format";
     return NULL;
}

The same issue looks like it applies to the ProxyWebsocketIdleTimeout setting 
as well.

- Steve

From: Edward Lu [mailto:[email protected]]
Sent: Friday, July 25, 2014 11:38 AM
To: [email protected]
Subject: Re: Question about async mod_proxy_wstunnel and threads

I think I've found the issue; there's an extraneous apr_pollset_remove() call 
in event.c:1978. Relevant code (double-slash comments are mine):

    /* We only signal once per N sockets with this baton */
    if (!(baton->signaled)) {
        baton->signaled = 1;
        te = event_get_timer_event(-1 /* fake timer */,
                                   baton->cbfunc,
                                   baton->user_baton,
                                   0, /* don't insert it */
                                   NULL /* no associated socket callback */);
        /* remove other sockets in my set */
        // says remove "other" sockets, but it really removes all
        for (i = 0; i < baton->nsock; i++) {
            rc = apr_pollset_remove(event_pollset, baton->pfds[i]);
        }

        push_timer2worker(te);
    }
    // This causes the problem; if the sockets are processed and re-added
    //  before this is called, the backend socket gets re-removed, never to
    //  be seen again (until the client sends something)
    apr_pollset_remove(event_pollset, out_pfd);

Seems like the solution is just to remove the extra apr_pollset_remove(); all 
sockets are removed from the pollset when one of them is signalled. If another 
one was signalled in the same poll, it will enter this block and simply do 
nothing (because of the baton->signaled check). I gave it a quick test and it 
looked like the problem was solved.

On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu 
<[email protected]<mailto:[email protected]>> wrote:
I got it to break with these python scripts, as well as with my own test client 
and server (browser javascript and node.js). Strange though, I can't reproduce 
it with only 2 clients on either of those programs; I have to get 3 clients 
connected at once. Past that, I see exactly the behavior you describe.

I agree that it seems to be some sort of timing issue - I've been trying to put 
trace statements into the HTTPD to see what information I can get, but once I 
do that it starts behaving as expected. Still investigating...

On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep 
<[email protected]<mailto:[email protected]>> wrote:
Another tidbit of information ...

I attempted to see what was going on by running httpd in the foreground with 
strace. When I did that, the tests worked perfectly. I guess this suggests it 
is some sort of timing issue. Running httpd in the foreground without trace 
exhibited the problem.

Here's the output of strace during the test. I sent the string "hhhhhhh" from 
client 1 and it was echoed back by the server to both clients, as expected.

[pid 15451] <... futex resumed> )       = 0
[pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752, u64=144017752}}}, 2, 0) = 
1
[pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10
[pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0
[pid 15451] read(12, 0xb6a39aa8, 8000)  = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10
[pid 15451] read(12, 0xa96084b0, 8000)  = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 15451] epoll_wait(13, {}, 2, 0)    = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=144044472, 
u64=144044472}}) = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=144044496, 
u64=144044496}}) = 0
[pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0
[pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856, u64=2843761856}}}, 2, 0) 
= 1
[pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10
[pid 15451] read(9, 0xb6a39aa8, 8000)   = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10
[pid 15451] read(9, 0xa96084b0, 8000)   = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 15451] epoll_wait(10, {}, 2, 0)    = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2843796920, 
u64=2843796920}}) = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=2843796944, 
u64=2843796944}}) = 0
[pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0
[pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL <unfinished ...>


- Steve

-----Original Message-----
From: Steve Zweep
Sent: Monday, July 21, 2014 11:56 AM
To: '[email protected]<mailto:[email protected]>'
Subject: RE: Question about async mod_proxy_wstunnel and threads

My client and server are based on sample code included with the ws4py package 
(http://ws4py.readthedocs.org/en/latest/). We use CherryPy and the ws4py 
package contains a CherryPy-based server (attached).

The client is a variation of the echo_client.py sample code also included in 
the ws4py package.

In terms of failure rate, I'd say it is about 90%. Often I start 2 clients and 
initially it works OK, but then stops after a message or two. Once it has 
failed, I can send any number of messages from the one client and continue to 
see the server echo them back, but no message is received by the second. When I 
then send a message from the second client I see a flood of messages from the 
server that had been queued.
- Steve


-----Original Message-----
From: Eric Covener [mailto:[email protected]<mailto:[email protected]>]
Sent: Saturday, July 19, 2014 8:16 AM
To: Apache HTTP Server Development List
Subject: Re: Question about async mod_proxy_wstunnel and threads

I guess it is kind of telling that we could still respond to the write from 
client 2, there are probably more potential errors that make us lose track of 
both sides simultaneously.

Can you share your ws client and server if they're easy to use? My little 
sample doesn't spray that way.

I'm also curious approximately how often it fails this way.

On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic 
<[email protected]<mailto:[email protected]>> wrote:
> Probably a false trail, trunk at r1611741 is up to date, I think 1.5.x
> won't help.
>
> On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep
> <[email protected]<mailto:[email protected]>> wrote:
>> Hi Yann,
>>
>> The test I ran today was built with APR from trunk (r1611741). I see that 
>> r1605769 modifies apr_skiplist.c and there has been a lot of activity in the 
>>  trunk version of that code lately. I can try with the 1.5.x branch code to 
>> see if it makes a difference. Probably won't get to this before Monday 
>> though.
>>
>> - Steve
>>
>> -----Original Message-----
>> From: Yann Ylavic [mailto:[email protected]<mailto:[email protected]>]
>> Sent: Friday, July 18, 2014 4:51 PM
>> To: httpd
>> Subject: Re: Question about async mod_proxy_wstunnel and threads
>>
>> Hi Steve,
>>
>> can you still reproduce with the latest APR 1.5.x, notably containing this 
>> fix: http://svn.apache.org/r1605769.
>> I don't think there is a released version with this patch...
>>
>> Regards,
>> Yann.
>>
>> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep 
>> <[email protected]<mailto:[email protected]>> wrote:
>>> I've attached annotated logs that show the issues I described. Both 
>>> scenarios have ProxyWebsocketAsync turned on. The first does not use the 
>>> AsyncDelay and shows how server messages stall and are not delivered until 
>>> the client polls. The second has ProxyWebsocketAsyncDelay set to 100. In 
>>> that case, message processing works properly, but threads are held open and 
>>> there is no sign of async processing.
>>>
>>> Since my build and execution environment were somewhat non-standard, I 
>>> repeated all the tests today on stock Ubuntu 14.04, with a fresh checkout 
>>> and build of httpd and apr trunk code. The same results were observed.
>>>
>>> - Steve
>>>
>>>
>>>
>>> -----Original Message-----
>>> ________________________________________
>>> From: Eric Covener [[email protected]<mailto:[email protected]>]
>>> Sent: July 17, 2014 9:15 PM
>>> To: Apache HTTP Server Development List
>>> Subject: Re: Question about async mod_proxy_wstunnel and threads
>>>
>>> I am having trouble seeing it mis-behave. w/ Async and AsyncDelay, I
>>> am seeing the expected trace messages and when I look at backtraces
>>> of httpd I can see zero threads in wstunnel . If I send a server
>>> msg, I get it ASAP in the client -- and then I see 1 thread in poll
>>> for the right couple of seconds
>>>
>>> Can you grab trace at e.g.
>>>
>>> LogLevel INFO proxy_wstunnel_module:trace8
>>>
>>> And annotate the timing a bit for what you do in the client?  Is it 
>>> possible you have an un-updated trunk from several weeks ago?  There was an 
>>> optimization put in and backed out that might have broke some of these same 
>>> things for a very short window.
>>>



--
Eric Covener
[email protected]<mailto:[email protected]>

RE: Question about async mod_proxy_wstunnel and threads

Reply via email to