[
https://issues.apache.org/jira/browse/SOLR-18087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Kot-Zaniewski updated SOLR-18087:
--------------------------------------
Description:
There appear to be some severe regressions after expansion of HTTP/2 client
usage since at least 9.8, most notably with the stream handler as well as index
recovery. The impact is at the very least slowness and in some cases outright
response stalling. The response stalling appears to be caused by HTTP/2's flow
control. The obvious thing these two very different workloads share in common
is that they stream large responses. This means, among other things, that they
may be more directly impacted by HTTP2's flow control mechanism.
In my testing I have tweaked the following parameters:
# http1 vs http2 - as stated, http1 seems to be strictly better as in faster
and more stable.
# shards per node - the greater the number of shards per node the more (large,
simultaneous) responses share a single connection during inter-node
communication. This has generally resulted in poorer performance.
# maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
multiplexing. Circumventing multiplexing does seem to improve index recovery in
HTTP/2 but this is not a good setting to keep for production use because it is
global and affects *everything*, not just recovery or streaming.
# initialSessionRecvWindow - This is the amount of buffer the client gets
initially for each connection. This gets shared by the many responses that
share the multiplexed connection.
# initialStreamRecvWindow - This is the amount of buffer each stream gets
initially within a single HTTP/2 session. I've found that when this is too big
relative to initialSessionRecvWindow it can lead to stalling because of flow
control enforcement
# Simple vs Buffering Flow Control Strategy - Controls how frequently the
client sends a WINDOW_UPDATE frame to signal the server to send more data.
"Simple" sends the frame after consuming any amount of bytes while "Buffering"
waits until a consumption threshold is met. So far "Simple" has NOT worked
reliably for me and probably why the default is "Buffering".
I’m attaching summaries of my findings, some of which can be reproduced by
running the appropriate benchmark in this
[branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
The stream benchmark results md file includes the command I ran to achieve the
result described.
Next steps:
Reproduce this in a pure jetty example. I am beginning to think multiple large
responses getting streamed simultaneously between the same client and server
may some kind of edge case in the library or protocol, itself. It may have
something to do with how Jetty's InputStreamResponseListener is implemented
although according to the docs it _should_ be compatible with HTTP/2.
Furthermore, there may be some other levers offered by HTTP/2 which are not yet
exposed by the Jetty API.
On the other hand, we could consider having separate connection pools for HTTP
clients that stream large responses. There seems to be at least [some
precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
for doing this.
> We investigate and develop a new domain-sharding technique that isolates
> large downloads on separate TCP connections, while keeping downloads of small
> objects on a single connection.
HTTP/2 seems designed for [bursty, small
traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
which is why flow-control may not impact it as much. Also, if your payload is
small relative to your header then HTTP/2's header compression might be a big
win for you but in the case of large responses, not as much.
> Most HTTP transfers are short and bursty, whereas TCP is optimized for
> long-lived, bulk data transfers.
was:
There appear to be some severe regressions after expansion of HTTP/2 client
usage since at least 9.8, most notably with the stream handler as well as index
recovery. The impact is at the very least slowness and in some cases outright
response stalling. The response stalling appears to be caused by HTTP/2's flow
control. The obvious thing these two very different workloads share in common
is that they stream large responses. This means, among other things, that they
may be more directly impacted by HTTP2's flow control mechanism.
In my testing I have tweaked the following parameters:
# http1 vs http2 - as stated, http1 seems to be strictly better as in faster
and more stable.
# shards per node - the greater the number of shards per node the more (large,
simultaneous) responses share a single connection during inter-node
communication. This has generally resulted in poorer performance.
# maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
multiplexing. Circumventing multiplexing does seem to improve index recovery
somewhat (still slower than HTTP/1). On the other hand, this seems antithetical
to the point of http2. It's also interesting this doesn't help
# initialSessionRecvWindow - This is the amount of buffer the client gets
initially for each connection. This gets shared by the many responses that
share the multiplexed connection.
# initialStreamRecvWindow - This is the amount of buffer each stream gets
initially within a single HTTP/2 session. I've found that when this is too big
relative to initialSessionRecvWindow it can lead to stalling because of flow
control enforcement
# Simple vs Buffering Flow Control Strategy - Controls how frequently the
client sends a WINDOW_UPDATE frame to signal the server to send more data.
"Simple" sends the frame after consuming any amount of bytes while "Buffering"
waits until a consumption threshold is met. So far "Simple" has NOT worked
reliably for me and probably why the default is "Buffering".
I’m attaching summaries of my findings, some of which can be reproduced by
running the appropriate benchmark in this
[branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
The stream benchmark results md file includes the command I ran to achieve the
result described.
Next steps:
Reproduce this in a pure jetty example. I am beginning to think multiple large
responses getting streamed simultaneously between the same client and server
may some kind of edge case in the library or protocol, itself. It may have
something to do with how Jetty's InputStreamResponseListener is implemented
although according to the docs it _should_ be compatible with HTTP/2.
We could also consider having separate connection pools for HTTP clients that
stream large responses. There seems to be at least [some
precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
for doing this.
> We investigate and develop a new domain-sharding technique that isolates
> large downloads on separate TCP connections, while keeping downloads of small
> objects on a single connection.
HTTP/2 seems designed for [bursty, small
traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
which is why flow-control may not impact it as much. Also, if your payload is
small relative to your header then HTTP/2's header compression might be a big
win for you but in the case of large responses, not as much.
> Most HTTP transfers are short and bursty, whereas TCP is optimized for
> long-lived, bulk data transfers.
> HTTP/2 Struggles With Streaming Large Responses
> -----------------------------------------------
>
> Key: SOLR-18087
> URL: https://issues.apache.org/jira/browse/SOLR-18087
> Project: Solr
> Issue Type: Bug
> Reporter: Luke Kot-Zaniewski
> Priority: Major
> Labels: pull-request-available
> Attachments: flow-control-stall.log, index-recovery-tests.md,
> stream-benchmark-results.md
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There appear to be some severe regressions after expansion of HTTP/2 client
> usage since at least 9.8, most notably with the stream handler as well as
> index recovery. The impact is at the very least slowness and in some cases
> outright response stalling. The response stalling appears to be caused by
> HTTP/2's flow control. The obvious thing these two very different workloads
> share in common is that they stream large responses. This means, among other
> things, that they may be more directly impacted by HTTP2's flow control
> mechanism.
> In my testing I have tweaked the following parameters:
> # http1 vs http2 - as stated, http1 seems to be strictly better as in faster
> and more stable.
> # shards per node - the greater the number of shards per node the more
> (large, simultaneous) responses share a single connection during inter-node
> communication. This has generally resulted in poorer performance.
> # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
> multiplexing. Circumventing multiplexing does seem to improve index recovery
> in HTTP/2 but this is not a good setting to keep for production use because
> it is global and affects *everything*, not just recovery or streaming.
> # initialSessionRecvWindow - This is the amount of buffer the client gets
> initially for each connection. This gets shared by the many responses that
> share the multiplexed connection.
> # initialStreamRecvWindow - This is the amount of buffer each stream gets
> initially within a single HTTP/2 session. I've found that when this is too
> big relative to initialSessionRecvWindow it can lead to stalling because of
> flow control enforcement
> # Simple vs Buffering Flow Control Strategy - Controls how frequently the
> client sends a WINDOW_UPDATE frame to signal the server to send more data.
> "Simple" sends the frame after consuming any amount of bytes while
> "Buffering" waits until a consumption threshold is met. So far "Simple" has
> NOT worked reliably for me and probably why the default is "Buffering".
> I’m attaching summaries of my findings, some of which can be reproduced by
> running the appropriate benchmark in this
> [branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
> The stream benchmark results md file includes the command I ran to achieve
> the result described.
> Next steps:
> Reproduce this in a pure jetty example. I am beginning to think multiple
> large responses getting streamed simultaneously between the same client and
> server may some kind of edge case in the library or protocol, itself. It may
> have something to do with how Jetty's InputStreamResponseListener is
> implemented although according to the docs it _should_ be compatible with
> HTTP/2. Furthermore, there may be some other levers offered by HTTP/2 which
> are not yet exposed by the Jetty API.
> On the other hand, we could consider having separate connection pools for
> HTTP clients that stream large responses. There seems to be at least [some
> precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
> for doing this.
> > We investigate and develop a new domain-sharding technique that isolates
> > large downloads on separate TCP connections, while keeping downloads of
> > small objects on a single connection.
> HTTP/2 seems designed for [bursty, small
> traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
> which is why flow-control may not impact it as much. Also, if your payload
> is small relative to your header then HTTP/2's header compression might be a
> big win for you but in the case of large responses, not as much.
> > Most HTTP transfers are short and bursty, whereas TCP is optimized for
> > long-lived, bulk data transfers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]