https://bz.apache.org/bugzilla/show_bug.cgi?id=60956

--- Comment #1 from Frank Meier <frank.me...@ergon.ch> ---
I finally was able to reproduce the phenomenon. It is occurs if a request
handler triggers the asynchronous write completion feature, which gives the
listener thread the opportunity to send the final bytes of a response to the
client asynchronously, without blocking a worker thread. But if the client
refuses to read the data, the connection gets stalled (TCP Window FULL message
in wireshark). This does not block the listener thread, since it does the
writing asynchronously, and a stalled connection is not a problem. But then,
after the timeout ([1] default 60s), the listener thread wants to close the
connection and triggers start_lingering_close_nonblocking() and the listener
thread gets blocked as described above. After another timeout interval [1] the
listener thread recovers from it's misery.

The tricky part to reproduce this, is to get the right amout of data locked in
the TCP pipeline (receive buffer of the client, and the send buffer of the
server).
1) If the client blocks to early, the pipeline fills up, but if the module has
more than 64k of data to send, the asynchronous write completion feature is not
triggered. 
2) If the client blocks to late, there is enough "space" in the TCP pipeline to
accommodate all the remaining bytes including the SSL shutdown alert, in which
case the start_lingering_close_nonblocking() function does not block.

I've written some test code to simplify the reproducibility:
* a httpd module (mod_gendata) which generates a given amount of body data,
where the last 60k are not flushed, this should trigger the asynchronous write
completion in the listener thread.
* a special HTTPS client, that reads a given amount of data from the server and
then stops reading completely.
* a httpd.conf file that only starts one single httpd process with 2 worker
threads, that makes it easy to show whats happening if we look at the stack of
the process.

On my test system the TCP pipeline was full at around ~800k. So I request 850k
of data, read ~1000 bytes so the headers are received and then stop receiving.
I see, that the write completion was triggered if all the worker threads are
idle (check with gstack). And I see if the TCP pipeline is full, when the TCP
connection does *not* enter the FIN_WAIT1 state after the configured
KeepAliveTimout [2] (check with netstat). If both conditions are met, the
listener thread calls start_lingering_close_nonblocking() after 60s, and
blocks. It may take some tries to figure out the right amount of data that has
to be requested to get it right.


[1] https://httpd.apache.org/docs/2.4/mod/core.html#timeout
[2] https://httpd.apache.org/docs/2.4/mod/core.html#keepalivetimeout

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org

Reply via email to