Total http/2 concurrency for multiplexed multi-handle

Jeroen Ooms via curl-library Wed, 08 Feb 2023 09:03:09 -0800

I have a CRON job scraping some content from GitHub every night (about
20k small files). It worked well for a year, but recently something
changed such that after a few minutes GitHub stars giving a lot of 403
and then after another minute I start getting thousands of these:


HTTP/2 stream 20135 was not closed cleanly before end of the underlying stream
HTTP/2 stream 20137 was not closed cleanly before end of the underlying stream
HTTP/2 stream 20139 was not closed cleanly before end of the underlying stream

So either they introduced a server bug, or perhaps GitHub is
deliberately blocking abusive behavior due to high concurrency.

I am using a multi handle with CURLPIPE_MULTIPLEX and otherwise
default settings. Am I correct that this means libcurl starts 100
concurrent streams (CURLMOPT_MAX_CONCURRENT_STREAMS), and still make 6
concurrent connections (CURLMOPT_MAX_HOST_CONNECTIONS) per host, i.e.
download 600 files in parallel? I can imagine that could be considered
abusive.

Should I set CURLMOPT_MAX_HOST_CONNECTIONS to 1 in case of http/2
multiplexing? Or is CURLMOPT_MAX_HOST_CONNECTIONS ignored in case of
multiplexing?

One other thing I noticed is that GitHub does not seem to set any
MAX_CONCURRENT_STREAMS, or at least I am not seeing any. For example
on httpbin I see this:

   curl -v 'https://httpbin.org/get' --http2
   * Connection state changed (MAX_CONCURRENT_STREAMS == 128)!

However for GitHub I don't see such a thing:

    curl -v 'https://raw.githubusercontent.com/curl/curl/master/README' --http2

So does this mean libcurl will assume 100 streams is OK?

Is there a way to debug this, and monitor how many active downloads a
multi-handle is making in total (summed over all connections)? Afaik,
the value 'running_handles' from curl_multi_perform() gives me the
total uncompleted requests, including those that have not started yet,
so that does not tell me how many are in progress?
-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Total http/2 concurrency for multiplexed multi-handle

Reply via email to