> Am 30.06.2021 um 18:01 schrieb Eric Covener <cove...@gmail.com>:
> 
> On Wed, Jun 30, 2021 at 11:46 AM Stefan Eissing
> <stefan.eiss...@greenbytes.de> wrote:
>> 
>> It looks like we stumbled upon an issue in 
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life 
>> times of our backend connections.
>> 
>> When a frontend connection causes a backend request and drops, our backend 
>> connection only notifies the loss when it attempts to pass some data. In 
>> normal http response processing, this is not an issue since response chunks 
>> are usually coming in quite frequently. Then the proxied connection will 
>> fail to pass it to an aborted frontend connection and cleanup will occur.
>> 
>> However, with such modern shenanigans such as Server Side Events (SSE), the 
>> request is supposed to be long running and will produce body chunks quite 
>> infrequently, like every 30 seconds or so. This leaves our proxy workers 
>> hanging in recv for quite a while and may lead to worker exhaustion.
>> 
>> We can say SSE is a bad idea anyway, but that will probably not stop people 
>> from doing such crazy things.
>> 
>> What other mitigations do we have?
>> - pthread_kill() will interrupt the recv and probably make it fail
>> - we can use shorter socket timeouts on backend and check r->connection 
>> status in between
>> - ???
> 
> 
> In trunk the tunnelling side of mod_proxy_http can go async and get
> called back for activity on either side by asking Event to watch both
> sockets.


How does that work, actually? Do we have an example somewhere?

> I'm not sure how browsers treat the SSE connection, can it ever have a
> subsequent request?  If not, maybe we could see the SSE Content-Type
> and shoehorn it into the tunneling (figuring out what to do with
> writes from the client, backport the event and async tunnel stuff?)

I don't think they will do a subsequent request in the HTTP/1.1 sense,
meaning they'll close their H1 connection and open a new one. In H2 land,
the request connection is a virtual "secondary" one away.

But changing behaviour based on the content type seems inadequate. When
the server proxies applications (like uwsgi), the problem may also happen
to requests that are slow producing responses.

To DoS such a setup, where a proxied response takes n seconds, you'd need 
total_workers / n aborted requests per second. In HTTP/1.1 that would
all be connections and maybe noticeable from a supervisor, but in H2 this
could happen all on the same tcp connection (although our h2 implementation
has some protection against abusive client behaviour).

A general solution to the problem would therefore be valuable, imo.

We should think about solving this in the context of mpm_event, which
I believe is the production recommended setup that merits our efforts.

If mpm_event could make the link between one connection to another,
like frontend to backend, it could wake up backends on a frontend
termination. Do you agree, Yann?

Could this be as easy as adding another "conn_rec *context" field
in conn_rec that tracks this?

- Stefan

Reply via email to