Hey Jason,

I also suspect that a lot of people may
still be running in http/1 mode and have
simply forgotten about it.

A common issue we see is the

SOLR Max requests queued per destination 3000 exceeded for HttpDestination

For which there is already a stackoverflow
question. The other thing one might see
is random replication stalling, the most
insidious of which is with pull replicas 
getting their connections with the leader
busted but then remaining active indefinitely
since there is no mechanism to alert on
such a stalled connection. Perhaps that is
an area of improvement we can look at.
We have built some monitoring workarounds
for this use case internally but there could
be something to be done upstream.

Finally, one symptom you could see is the
aforementioned /stream calls timing out.
You eventually hit idle time out as client
and server can get deadlocked.

All of this is more prone to happen if you
have a large amount of colocated shards.

Luke
Sent from Bloomberg Professional for Android

----- Original Message -----
From: Jason Gerlowski <[email protected]>
To: [email protected]
At: 02/13/26 10:10:08 UTC-05:00


Hey Luke,

> One thing that has puzzled me is how no one else seems to be
> complaining about this :-)

I think a lot of folks out there are probably hitting the same issues
and just don't realize it because their opsviz dashboards aren't quite
as evolved, or they don't have the time or expertise to dig in.  I
also know there are a lot of folks who just aren't running HTTP 2,
yet.  There were a lot of issues with Solr's HTTP 2 support when it
was first introduced, and I think a lot of folks turned it off around
that time and still run with it disabled today.

I'm curious - are there particular log messages that would be a good
indicator that someone is hitting this problem?  Something easy that
folks could use to check even if they don't have a huge amount of time
on-hand to dig in?

> reproduce this with a minimal Jetty
> example, without Solr in the mix

That seems like the most promising route forward to me.  👍

Best,

Jason

On Wed, Jan 28, 2026 at 4:47 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD
A) <[email protected]> wrote:
>
> Hi All,
>
> I've raised this before during a meetup and on the dev slack
> but I'd like to raise it again after a more thorough review
> on my part. HTTP/2 seems to struggle with streaming large
> responses relative to HTTP/1. I was hoping the problem would
> "go away" with the latest versions but I can reproduce the same
> slowness and occasional stalling that we saw with Solr 9.X/Jetty 9.x,
> running very recent Solr main and Jetty 12.
>
> After observing the same issues, I decided to do a deeper dive on
> HTTP/2 and Jetty's HTTP/2 API. I found a variety of levers to tune
> flow-control (one of the major architectural shifts of HTTP/2 over 1)
> but TLDR; none of them really worked in improving the performance
> reliably. You can read a more detailed version of the analysis here
> https://issues.apache.org/jira/browse/SOLR-18087
>
> Some of the tests I ran can be hopefully reproduced running the
> benchmarks I added here
> https://github.com/apache/solr/pull/4079
>
> The linked jira ticket has, among others, an attachment detailing
> the stream benchmark results as well as the exact jmh command
> that was run to achieve each result listed.
>
> A possible next step would be to reproduce this with a minimal Jetty
> example, without Solr in the mix. At a high level, we are streaming
> several large files concurrently over the same HTTP/2 connection,
> using Jetty’s InputStreamResponseListener to expose each response as
> an InputStream. If we can demonstrate the degradation in a small
> standalone test, we could share it with the Jetty project to see
> if there are optimizations we are missing, or additional flow-control
> control knobs that should be exposed.
>
> My current understanding is that HTTP/2 is a bigger win for smaller
> request/response traffic on connections that are often idle, where
> header compression and multiplexing help and flow control is less
> likely to be the bottleneck. For concurrent bulk streaming, HTTP
> layer flow control seems to hurt performance vs standard TCP which
> is famously performant for this kind of workflow.
>
> One thing that has puzzled me is how no one else seems to be
> complaining about this :-). It's possible that our set-up is
> unique, i.e. the problem is exacerbated by multiple shards
> co-located on a single, addressable node. We may also not be
> fully utilizing our network bandwidth with a single TCP connection
> and thus piling on to the flow control overhead (but my testing
> suggests flow control is significantly contributing to this).
>
> I'd appreciate any thoughts the community may have about this
> issue. I'd also love to hear about your Solr topology (if you
> are able to share), i.e. how many shards do you have on a single
> process and whether these shards share a single address from the
> perspective of other nodes.
>
> Thanks,
> Luke

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to