[
https://issues.apache.org/jira/browse/SOLR-18087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Kot-Zaniewski updated SOLR-18087:
--------------------------------------
Attachment: (was: index-recovery-tests.md)
> HTTP/2 Struggles With Streaming Large Responses
> -----------------------------------------------
>
> Key: SOLR-18087
> URL: https://issues.apache.org/jira/browse/SOLR-18087
> Project: Solr
> Issue Type: Bug
> Reporter: Luke Kot-Zaniewski
> Priority: Major
> Labels: pull-request-available
> Attachments: flow-control-stall.log, index-recovery-tests.md,
> stream-benchmark-results.md
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There appear to be some severe regressions after expansion of HTTP/2 client
> usage since at least 9.8, most notably with the stream handler as well as
> index recovery. The impact is at the very least slowness and in some cases
> outright response stalling. The response stalling appears to be caused by
> HTTP/2's flow control. The obvious thing these two very different workloads
> share in common is that they stream large responses. This means, among other
> things, that they may be more directly impacted by HTTP2's flow control
> mechanism.
> In my testing I have tweaked the following parameters:
> # http1 vs http2 - as stated, http1 seems to be strictly better as in faster
> and more stable.
> # shards per node - the greater the number of shards per node the more
> (large, simultaneous) responses share a single connection during inter-node
> communication. This has generally resulted in poorer performance.
> # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
> multiplexing. Circumventing multiplexing does seem to improve index recovery
> in HTTP/2 but this is not a good setting to keep for production use because
> it is global and affects *everything*, not just recovery or streaming.
> # initialSessionRecvWindow - This is the amount of buffer the client gets
> initially for each connection. This gets shared by the many responses that
> share the multiplexed connection.
> # initialStreamRecvWindow - This is the amount of buffer each stream gets
> initially within a single HTTP/2 session. I've found that when this is too
> big relative to initialSessionRecvWindow it can lead to stalling because of
> flow control enforcement
> # Simple vs Buffering Flow Control Strategy - Controls how frequently the
> client sends a WINDOW_UPDATE frame to signal the server to send more data.
> "Simple" sends the frame after consuming any amount of bytes while
> "Buffering" waits until a consumption threshold is met. So far "Simple" has
> NOT worked reliably for me and probably why the default is "Buffering".
> I’m attaching summaries of my findings, some of which can be reproduced by
> running the appropriate benchmark in this
> [branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
> The stream benchmark results md file includes the command I ran to achieve
> the result described.
> Next steps:
> Reproduce this in a pure jetty example. I am beginning to think multiple
> large responses getting streamed simultaneously between the same client and
> server may some kind of edge case in the library or protocol, itself. It may
> have something to do with how Jetty's InputStreamResponseListener is
> implemented although according to the docs it _should_ be compatible with
> HTTP/2. Furthermore, there may be some other levers offered by HTTP/2 which
> are not yet exposed by the Jetty API.
> On the other hand, we could consider having separate connection pools for
> HTTP clients that stream large responses. There seems to be at least [some
> precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
> for doing this.
> > We investigate and develop a new domain-sharding technique that isolates
> > large downloads on separate TCP connections, while keeping downloads of
> > small objects on a single connection.
> HTTP/2 seems designed for [bursty, small
> traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
> which is why flow-control may not impact it as much. Also, if your payload
> is small relative to your header then HTTP/2's header compression might be a
> big win for you but in the case of large responses, not as much.
> > Most HTTP transfers are short and bursty, whereas TCP is optimized for
> > long-lived, bulk data transfers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]