[ 
https://issues.apache.org/jira/browse/SOLR-18087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Kot-Zaniewski updated SOLR-18087:
--------------------------------------
    Attachment: index-recovery-tests.md
        Status: Open  (was: Open)

> HTTP/2 Struggles With Streaming Large Responses
> -----------------------------------------------
>
>                 Key: SOLR-18087
>                 URL: https://issues.apache.org/jira/browse/SOLR-18087
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Luke Kot-Zaniewski
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: flow-control-stall.log, index-recovery-tests.md, 
> stream-benchmark-results.md
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There appear to be some severe regressions after expansion of HTTP/2 client 
> usage since at least 9.8, most notably with the stream handler as well as 
> index recovery. The impact is at the very least slowness and in some cases 
> outright response stalling. The response stalling appears to be caused by 
> HTTP/2's flow control. The obvious thing these two very different workloads 
> share in common is that they stream large responses. This means, among other 
> things, that they may be more directly impacted by HTTP2's flow control 
> mechanism.
> In my testing I have tweaked the following parameters:
>  # http1 vs http2 - as stated, http1 seems to be strictly better as in faster 
> and more stable.
>  # shards per node - the greater the number of shards per node the more 
> (large, simultaneous) responses share a single connection during inter-node 
> communication. This has generally resulted in poorer performance.
>  # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent 
> multiplexing. Circumventing multiplexing does seem to improve index recovery 
> in HTTP/2 but this is not a good setting to keep for production use because 
> it is global and affects *everything*, not just recovery or streaming.
>  # initialSessionRecvWindow - This is the amount of buffer the client gets 
> initially for each connection. This gets shared by the many responses that 
> share the multiplexed connection.
>  #  initialStreamRecvWindow - This is the amount of buffer each stream gets 
> initially within a single HTTP/2 session. I've found that when this is too 
> big relative to initialSessionRecvWindow it can lead to stalling because of 
> flow control enforcement
> # Simple vs Buffering Flow Control Strategy - Controls how frequently the 
> client sends a WINDOW_UPDATE frame to signal the server to send more data. 
> "Simple" sends the frame after consuming any amount of bytes while 
> "Buffering" waits until a consumption threshold is met. So far "Simple" has 
> NOT worked reliably for me and probably why the default is "Buffering".
> I’m attaching summaries of my findings, some of which can be reproduced by 
> running the appropriate benchmark in this 
> [branch|https://github.com/kotman12/solr/tree/http2-shenanigans|https://github.com/kotman12/solr/tree/http2-shenanigans].
>  The stream benchmark results md file includes the command I ran to achieve 
> the result described. 
> Next steps:
> Reproduce this in a pure jetty example. I am beginning to think multiple 
> large responses getting streamed simultaneously between the same client and 
> server may some kind of edge case in the library or protocol, itself. It may 
> have something to do with how Jetty's InputStreamResponseListener is 
> implemented although according to the docs it _should_ be compatible with 
> HTTP/2. Furthermore, there may be some other levers offered by HTTP/2 which 
> are not yet exposed by the Jetty API.
> On the other hand, we could consider having separate connection pools for 
> HTTP clients that stream large responses. There seems to be at least [some 
> precedent|https://www.akamai.com/site/en/documents/research-paper/domain-sharding-for-faster-http2-in-lossy-cellular-networks.pdf]
>  for doing this.
> > We investigate and develop a new domain-sharding technique that isolates 
> > large downloads on separate TCP connections, while keeping downloads of 
> > small objects on a single connection.
> HTTP/2 seems designed for [bursty, small 
> traffic|https://hpbn.co/http2/?utm_source=chatgpt.com#one-connection-per-origin]
>  which is why flow-control may not impact it as much. Also, if your payload 
> is small relative to your header then HTTP/2's header compression might be a 
> big win for you but in the case of large responses, not as much. 
> > Most HTTP transfers are short and bursty, whereas TCP is optimized for 
> > long-lived, bulk data transfers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to