[ 
https://issues.apache.org/jira/browse/SOLR-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020163#comment-18020163
 ] 

Sanjay Dutt edited comment on SOLR-17916 at 9/14/25 2:32 PM:
-------------------------------------------------------------

{quote}Can we configure the rate control?
{quote}
This test can send hundreds of requests in a second, which fans out into 
thousands once each request hits three shards. That makes it harder to come up 
with any realistic limit.

For reference, here’s how the limit can be set:

*In {{{}JettySolrRunner{}}}:*
{code:java}
HTTP2ServerConnectionFactory http2ConnectionFactory = new 
HTTP2CServerConnectionFactory(configuration);
http2ConnectionFactory.setRateControlFactory(new 
WindowRateControl.Factory(256));
connector =
new ServerConnector(
server,
new HttpConnectionFactory(configuration),
new HTTP2CServerConnectionFactory(configuration));
{code}
 
*In {{{}jetty-https.xml{}}}:*
{code:java}
<New class="org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory">
<Arg name="config"><Ref refid="sslHttpConfig"/></Arg>
<Set name="rateControlFactory">
<New class="org.eclipse.jetty.http2.WindowRateControl$Factory">
<Arg type="int"><Property name="jetty.http2.rateControl.maxEventsPerSecond" 
default="256"/></Arg>
</New>
</Set>
</New>{code}
 
{quote}Should we just disable it?
{quote}
The rate control exists because of [this 
CVE|https://github.com/advisories/GHSA-mmxm-8w33-wc4h]:

_Thus, the attacker can exploit this vulnerability to cause the server to 
handle an unbounded number of concurrent streams from a client on the same 
connection. The exploitation is very simple: the client issues a request in a 
stream, and then sends the control frame that causes the server to send a 
RST_STREAM._
{quote}If not, maybe this test should use HTTP 1.1 to avoid this?
{quote}
Are you talking about starting Solr with HTTP/1.1—tests already use 
{{{}HttpSolrClient{}}}. The connection that trips the limit is the one used by 
{{{}HttpShardHandler{}}}.

Have you noticed that other tests were also failing from the same class and 
have nothing to do with shard_tolerant queries? 

Once {{testTolerantSearch}} crosses the limit, the entire HTTP/2 session used 
by {{HttpShardHandler}} is closed. That closes all streams on that session, 
which is why other tests fail as well. In production, if that happened and I 
believe it would be very easy to create such scenario, we could easily gets 
into situation where HttpShardHandler also silently closing/failing other 
requests as well. 


was (Author: JIRAUSER305513):
{quote}Can we configure the rate control?
{quote}
This test can send hundreds of requests in a second, which fans out into 
thousands once each request hits three shards. That makes it harder to come up 
with any realistic limit.

For reference, here’s how the limit can be set:

*In {{{}JettySolrRunner{}}}:*
HTTP2ServerConnectionFactory http2ConnectionFactory = new 
HTTP2CServerConnectionFactory(configuration);
http2ConnectionFactory.setRateControlFactory(new 
WindowRateControl.Factory(256));

connector =
    new ServerConnector(
        server,
        new HttpConnectionFactory(configuration),
        new HTTP2CServerConnectionFactory(configuration));
 
*In {{{}jetty-https.xml{}}}:*
<New class="org.eclipse.jetty.http2.server.HTTP2ServerConnectionFactory">
  <Arg name="config"><Ref refid="sslHttpConfig"/></Arg>
  <Set name="rateControlFactory">
    <New class="org.eclipse.jetty.http2.WindowRateControl$Factory">
      <Arg type="int"><Property 
name="jetty.http2.rateControl.maxEventsPerSecond" default="256"/></Arg>
    </New>
  </Set>
</New>
 
{quote}Should we just disable it?
{quote}
The rate control exists because of [this 
CVE|https://github.com/advisories/GHSA-mmxm-8w33-wc4h]:

_Thus, the attacker can exploit this vulnerability to cause the server to 
handle an unbounded number of concurrent streams from a client on the same 
connection. The exploitation is very simple: the client issues a request in a 
stream, and then sends the control frame that causes the server to send a 
RST_STREAM._
{quote}If not, maybe this test should use HTTP 1.1 to avoid this?
{quote}
Are you talking about starting Solr with HTTP/1.1—tests already use 
{{{}HttpSolrClient{}}}. The connection that trips the limit is the one used by 
{{{}HttpShardHandler{}}}.

Have you noticed that other tests were also failing from the same class and 
have nothing to do with shard_tolerant queries? 

Once {{testTolerantSearch}} crosses the limit, the entire HTTP/2 session used 
by {{HttpShardHandler}} is closed. That closes all streams on that session, 
which is why other tests fail as well. In production, if that happened and I 
believe it would be very easy to create such scenario, we could easily gets 
into situation where HttpShardHandler also silently closing/failing other 
requests as well. 

> Jetty 12.0.25 upgrade exposes RST_STREAM burst issue
> ----------------------------------------------------
>
>                 Key: SOLR-17916
>                 URL: https://issues.apache.org/jira/browse/SOLR-17916
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Sanjay Dutt
>            Priority: Major
>
> After upgrading Jetty from {*}12.0.19 → 12.0.25{*}, the test 
> {{DistributedDebugComponentTest.testTolerantSearch}} starts failing.
> The test sets up a query with a deliberately bad shard:
> {code:java}
> String badShard = DEAD_HOST_1 + "/solr/collection1";
> query.set("shards", badShard+ "," + shard2 + "," + shard1);
> for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
>       // verify that the request would fail if shards.tolerant=false
>       query.set(ShardParams.SHARDS_TOLERANT, "false");
>       ignoreException("Connection refused");
>       expectThrows(SolrException.class, () -> collection1.query(query));
>       // verify that the request would succeed if shards.tolerant=true
>       query.set(ShardParams.SHARDS_TOLERANT, "true");
>       QueryResponse response = collection1.query(query); // fail here!
> ....
> {code}
> For each iteration, it issues:
>  * *shards.tolerant = false* → as expected, the coordinator fails fast 
> because one shard is dead.
>  * *shards.tolerant = true* → expected to succeed using results from the good 
> shard(s), but {*}fails after the Jetty upgrade{*}.
> *Observed behavior*
>  * In the non-tolerant branch, {{SearchHandler}} throws early on the shard 
> exception.
>  * At this point {{HttpShardHandler}} cancels the outstanding async requests 
> to the other shards, calling {{future.cancel(true)}} / 
> {{{}request.abort(){}}}.
>  * That abort translates into *RST_STREAM* frames sent to Jetty.
>  * With the loop running hundreds of iterations, these cancels accumulate on 
> a single HTTP/2 session.
>  * Jetty 12.0.25 enforces stricter HTTP/2 rate control:
> GoAwayFrame\{... enhance_your_calm_error/invalid_rst_stream_frame_rate}
>  * Once the rate limit is tripped, the server responds with GOAWAY and closes 
> the connection.
>  * The subsequent tolerant request then fails, even though at least one shard 
> is healthy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to