[jira] [Updated] (SOLR-18087) HTTP/2 Struggles With Streaming Large Responses

Luke Kot-Zaniewski (Jira) Wed, 28 Jan 2026 11:20:12 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-18087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Luke Kot-Zaniewski updated SOLR-18087:
--------------------------------------
    Description: 
There appear to be some severe regressions after expansion of HTTP/2 client 
usage since at least 9.8, most notably with the stream handler as well as index 
recovery. The impact is at the very least slowness and in some cases outright 
response stalling. The response stalling appears to be caused by HTTP/2's flow 
control. The obvious thing these two very different workloads share in common 
is that they stream large responses. This means, among other things, that they 
may be more directly impacted by HTTP2's flow control mechanism.

In my testing I have tweaked the following parameters:
 # http1 vs http2 - as stated, http1 seems to be strictly better as in faster 
and more stable.
 # shards per node - the greater the number of shards per node the more (large, 
simultaneous) responses share a single connection during inter-node 
communication. This has generally resulted in poorer performance.
 # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent 
multiplexing. Circumventing multiplexing does seem to improve index recovery in 
HTTP/2 but this is not a good setting to keep for production use because it is 
global and affects *everything*, not just recovery or streaming.
 # initialSessionRecvWindow - This is the amount of buffer the client gets 
initially for each connection. This gets shared by the many responses that 
share the multiplexed connection.
 #  initialStreamRecvWindow - This is the amount of buffer each stream gets 
initially within a single HTTP/2 session. I've found that when this is too big 
relative to initialSessionRecvWindow it can lead to stalling because of flow 
control enforcement
# Simple vs Buffering Flow Control Strategy - Controls how frequently the 
client sends a WINDOW_UPDATE frame to signal the server to send more data. 
"Simple" sends the frame after consuming any amount of bytes while "Buffering" 
waits until a consumption threshold is met. So far "Simple" has NOT worked 
reliably for me and probably why the default is "Buffering".

I’m attaching summaries of my findings, some of which can be reproduced by 
running the appropriate benchmark in this 
[branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
 The stream benchmark results md file includes the command I ran to achieve the 
result described. 

Next steps:

Reproduce this in a pure jetty example. I am beginning to think multiple large 
responses getting streamed simultaneously between the same client and server 
may some kind of edge case in the library or protocol, itself. It may have 
something to do with how Jetty's InputStreamResponseListener is implemented 
although according to the docs it _should_ be compatible with HTTP/2. 
Furthermore, there may be some other levers offered by HTTP/2 which are not yet 
exposed by the Jetty API.

On the other hand, we could consider having separate connection pools for HTTP 
clients that stream large responses. There seems to be at least [some 
precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
 for doing this.

> We investigate and develop a new domain-sharding technique that isolates 
> large downloads on separate TCP connections, while keeping downloads of small 
> objects on a single connection.

HTTP/2 seems designed for [bursty, small 
traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
 which is why flow-control may not impact it as much. Also, if your payload is 
small relative to your header then HTTP/2's header compression might be a big 
win for you but in the case of large responses, not as much. 

> Most HTTP transfers are short and bursty, whereas TCP is optimized for 
> long-lived, bulk data transfers. 



  was:
There appear to be some severe regressions after expansion of HTTP/2 client 
usage since at least 9.8, most notably with the stream handler as well as index 
recovery. The impact is at the very least slowness and in some cases outright 
response stalling. The response stalling appears to be caused by HTTP/2's flow 
control. The obvious thing these two very different workloads share in common 
is that they stream large responses. This means, among other things, that they 
may be more directly impacted by HTTP2's flow control mechanism.

In my testing I have tweaked the following parameters:
 # http1 vs http2 - as stated, http1 seems to be strictly better as in faster 
and more stable.
 # shards per node - the greater the number of shards per node the more (large, 
simultaneous) responses share a single connection during inter-node 
communication. This has generally resulted in poorer performance.
 # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent 
multiplexing. Circumventing multiplexing does seem to improve index recovery 
somewhat (still slower than HTTP/1). On the other hand, this seems antithetical 
to the point of http2. It's also interesting this doesn't help 
 # initialSessionRecvWindow - This is the amount of buffer the client gets 
initially for each connection. This gets shared by the many responses that 
share the multiplexed connection.
 #  initialStreamRecvWindow - This is the amount of buffer each stream gets 
initially within a single HTTP/2 session. I've found that when this is too big 
relative to initialSessionRecvWindow it can lead to stalling because of flow 
control enforcement
# Simple vs Buffering Flow Control Strategy - Controls how frequently the 
client sends a WINDOW_UPDATE frame to signal the server to send more data. 
"Simple" sends the frame after consuming any amount of bytes while "Buffering" 
waits until a consumption threshold is met. So far "Simple" has NOT worked 
reliably for me and probably why the default is "Buffering".

I’m attaching summaries of my findings, some of which can be reproduced by 
running the appropriate benchmark in this 
[branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
 The stream benchmark results md file includes the command I ran to achieve the 
result described. 

Next steps:

Reproduce this in a pure jetty example. I am beginning to think multiple large 
responses getting streamed simultaneously between the same client and server 
may some kind of edge case in the library or protocol, itself. It may have 
something to do with how Jetty's InputStreamResponseListener is implemented 
although according to the docs it _should_ be compatible with HTTP/2.

We could also consider having separate connection pools for HTTP clients that 
stream large responses. There seems to be at least [some 
precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
 for doing this.

> We investigate and develop a new domain-sharding technique that isolates 
> large downloads on separate TCP connections, while keeping downloads of small 
> objects on a single connection.

HTTP/2 seems designed for [bursty, small 
traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
 which is why flow-control may not impact it as much. Also, if your payload is 
small relative to your header then HTTP/2's header compression might be a big 
win for you but in the case of large responses, not as much. 

> Most HTTP transfers are short and bursty, whereas TCP is optimized for 
> long-lived, bulk data transfers. 




> HTTP/2 Struggles With Streaming Large Responses
> -----------------------------------------------
>
>                 Key: SOLR-18087
>                 URL: https://issues.apache.org/jira/browse/SOLR-18087
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Luke Kot-Zaniewski
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: flow-control-stall.log, index-recovery-tests.md, 
> stream-benchmark-results.md
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There appear to be some severe regressions after expansion of HTTP/2 client 
> usage since at least 9.8, most notably with the stream handler as well as 
> index recovery. The impact is at the very least slowness and in some cases 
> outright response stalling. The response stalling appears to be caused by 
> HTTP/2's flow control. The obvious thing these two very different workloads 
> share in common is that they stream large responses. This means, among other 
> things, that they may be more directly impacted by HTTP2's flow control 
> mechanism.
> In my testing I have tweaked the following parameters:
>  # http1 vs http2 - as stated, http1 seems to be strictly better as in faster 
> and more stable.
>  # shards per node - the greater the number of shards per node the more 
> (large, simultaneous) responses share a single connection during inter-node 
> communication. This has generally resulted in poorer performance.
>  # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent 
> multiplexing. Circumventing multiplexing does seem to improve index recovery 
> in HTTP/2 but this is not a good setting to keep for production use because 
> it is global and affects *everything*, not just recovery or streaming.
>  # initialSessionRecvWindow - This is the amount of buffer the client gets 
> initially for each connection. This gets shared by the many responses that 
> share the multiplexed connection.
>  #  initialStreamRecvWindow - This is the amount of buffer each stream gets 
> initially within a single HTTP/2 session. I've found that when this is too 
> big relative to initialSessionRecvWindow it can lead to stalling because of 
> flow control enforcement
> # Simple vs Buffering Flow Control Strategy - Controls how frequently the 
> client sends a WINDOW_UPDATE frame to signal the server to send more data. 
> "Simple" sends the frame after consuming any amount of bytes while 
> "Buffering" waits until a consumption threshold is met. So far "Simple" has 
> NOT worked reliably for me and probably why the default is "Buffering".
> I’m attaching summaries of my findings, some of which can be reproduced by 
> running the appropriate benchmark in this 
> [branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
>  The stream benchmark results md file includes the command I ran to achieve 
> the result described. 
> Next steps:
> Reproduce this in a pure jetty example. I am beginning to think multiple 
> large responses getting streamed simultaneously between the same client and 
> server may some kind of edge case in the library or protocol, itself. It may 
> have something to do with how Jetty's InputStreamResponseListener is 
> implemented although according to the docs it _should_ be compatible with 
> HTTP/2. Furthermore, there may be some other levers offered by HTTP/2 which 
> are not yet exposed by the Jetty API.
> On the other hand, we could consider having separate connection pools for 
> HTTP clients that stream large responses. There seems to be at least [some 
> precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
>  for doing this.
> > We investigate and develop a new domain-sharding technique that isolates 
> > large downloads on separate TCP connections, while keeping downloads of 
> > small objects on a single connection.
> HTTP/2 seems designed for [bursty, small 
> traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
>  which is why flow-control may not impact it as much. Also, if your payload 
> is small relative to your header then HTTP/2's header compression might be a 
> big win for you but in the case of large responses, not as much. 
> > Most HTTP transfers are short and bursty, whereas TCP is optimized for 
> > long-lived, bulk data transfers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-18087) HTTP/2 Struggles With Streaming Large Responses

Reply via email to