Basically I want to do something very similar to below, but programmatically as part of a c++ application using libcurl:

curl --retry 10 --retry-all-errors --remote-name-all --parallel --parallel-max 150 "https://api.pwnedpasswords.com/range/000{0,1,2,3}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"; > curl.log 2>&1

The above retrieves 64 text files each about 32kB. On a cheap VM with
a Gbit internet connection this takes only about 0.2seconds. Awesome.

I started with this example

https://curl.se/libcurl/c/multi-event.html

from the official site. Used Verbatim.

Compiled like this:
gcc -O3 -Wall -Wextra -Wno-unused-parameter -std=c11 -o multi multi.c -lcurl -levent

I am on ubuntu 24.04:
$ uname -a
Linux oliver 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 21:40:26 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -l | egrep 'libcurl|libevent|openssl' | awk '{print $2,$3}' | column -t
libcurl3t64-gnutls:amd64          8.5.0-2ubuntu10.4
libcurl4t64:amd64                 8.5.0-2ubuntu10.4
libevent-2.1-7t64:amd64           2.1.12-stable-9ubuntu2
libevent-core-2.1-7t64:amd64      2.1.12-stable-9ubuntu2
libevent-dev                      2.1.12-stable-9ubuntu2
libevent-extra-2.1-7t64:amd64     2.1.12-stable-9ubuntu2
libevent-openssl-2.1-7t64:amd64   2.1.12-stable-9ubuntu2
libevent-pthreads-2.1-7t64:amd64  2.1.12-stable-9ubuntu2
openssl                           3.0.13-0ubuntu3.4

If I pass the same 64 urls to the resulting program as `argv` like this:
./multi \
     "https://api.pwnedpasswords.com/range/00000"; \
     "https://api.pwnedpasswords.com/range/00001"; \
     "https://api.pwnedpasswords.com/range/00002"; \
     "https://api.pwnedpasswords.com/range/00003"; \
     "https://api.pwnedpasswords.com/range/00004"; \
     "https://api.pwnedpasswords.com/range/00005"; \
     "https://api.pwnedpasswords.com/range/00006"; \
     "https://api.pwnedpasswords.com/range/00007"; \
     "https://api.pwnedpasswords.com/range/00008"; \
     "https://api.pwnedpasswords.com/range/00009"; \
     "https://api.pwnedpasswords.com/range/0000A"; \
     "https://api.pwnedpasswords.com/range/0000B"; \
     "https://api.pwnedpasswords.com/range/0000C"; \
     "https://api.pwnedpasswords.com/range/0000D"; \
     "https://api.pwnedpasswords.com/range/0000E"; \
     "https://api.pwnedpasswords.com/range/0000F"; \
     "https://api.pwnedpasswords.com/range/00010"; \
     "https://api.pwnedpasswords.com/range/00011"; \
     "https://api.pwnedpasswords.com/range/00012"; \
     "https://api.pwnedpasswords.com/range/00013"; \
     "https://api.pwnedpasswords.com/range/00014"; \
     "https://api.pwnedpasswords.com/range/00015"; \
     "https://api.pwnedpasswords.com/range/00016"; \
     "https://api.pwnedpasswords.com/range/00017"; \
     "https://api.pwnedpasswords.com/range/00018"; \
     "https://api.pwnedpasswords.com/range/00019"; \
     "https://api.pwnedpasswords.com/range/0001A"; \
     "https://api.pwnedpasswords.com/range/0001B"; \
     "https://api.pwnedpasswords.com/range/0001C"; \
     "https://api.pwnedpasswords.com/range/0001D"; \
     "https://api.pwnedpasswords.com/range/0001E"; \
     "https://api.pwnedpasswords.com/range/0001F"; \
     "https://api.pwnedpasswords.com/range/00020"; \
     "https://api.pwnedpasswords.com/range/00021"; \
     "https://api.pwnedpasswords.com/range/00022"; \
     "https://api.pwnedpasswords.com/range/00023"; \
     "https://api.pwnedpasswords.com/range/00024"; \
     "https://api.pwnedpasswords.com/range/00025"; \
     "https://api.pwnedpasswords.com/range/00026"; \
     "https://api.pwnedpasswords.com/range/00027"; \
     "https://api.pwnedpasswords.com/range/00028"; \
     "https://api.pwnedpasswords.com/range/00029"; \
     "https://api.pwnedpasswords.com/range/0002A"; \
     "https://api.pwnedpasswords.com/range/0002B"; \
     "https://api.pwnedpasswords.com/range/0002C"; \
     "https://api.pwnedpasswords.com/range/0002D"; \
     "https://api.pwnedpasswords.com/range/0002E"; \
     "https://api.pwnedpasswords.com/range/0002F"; \
     "https://api.pwnedpasswords.com/range/00030"; \
     "https://api.pwnedpasswords.com/range/00031"; \
     "https://api.pwnedpasswords.com/range/00032"; \
     "https://api.pwnedpasswords.com/range/00033"; \
     "https://api.pwnedpasswords.com/range/00034"; \
     "https://api.pwnedpasswords.com/range/00035"; \
     "https://api.pwnedpasswords.com/range/00036"; \
     "https://api.pwnedpasswords.com/range/00037"; \
     "https://api.pwnedpasswords.com/range/00038"; \
     "https://api.pwnedpasswords.com/range/00039"; \
     "https://api.pwnedpasswords.com/range/0003A"; \
     "https://api.pwnedpasswords.com/range/0003B"; \
     "https://api.pwnedpasswords.com/range/0003C"; \
     "https://api.pwnedpasswords.com/range/0003D"; \
     "https://api.pwnedpasswords.com/range/0003E"; \
     "https://api.pwnedpasswords.com/range/0003F"; \
     ;


It gets the files OK, but take 3seconds. `top` shows 100% CPU. ie CPU bound.

15x slower.

That makes using this unfeasible as I need to retrieve 1 million such files.

This article

https://daniel.haxx.se/docs/poll-vs-select.html

suggests that event based curl_multi is the fastest. That's why I
chose that example using libevent.

I checked the curl_multi options:
https://curl.se/libcurl/c/multi_setopt_options.html

to ensure I was getting connection pooling (all on the same domain
etc), and I didn't find anything to suggest I was not. The server for
the urls above offers HTTP2 with TLS3. I havent't checked but potentially `curl ---parallel` is using a single connection with HTTP2 streams?

What is `curl --parallel --parallel-max 150` doing internally and how
can I reproduce this performance with libcurl?

Many thanks for any help/input.

Oliver
--
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to