Hello curl developers, We use libcurl quite extensively in an open source program called nbdkit. https://gitlab.com/nbdkit/nbdkit/-/tree/master/plugins/curl
nbdkit is a server which serves Network Block Device (NBD) on one side and (in the case where we use curl) forwards to a web server. An example usage might be: $ nbdkit -r curl https://example.com/disk.img Then we'd access the server using the NBD protocol on port 10809 in order to random-access 'disk.img'. eg. to boot the disk in a VM you might do something like: $ qemu-system-x86_64 -M q35,accel=kvm -cpu host -m 2048 -drive file.driver=nbd,file.host=localhost,if=virtio,snapshot=on Or download the image to local using: $ nbdcopy nbd://localhost disk.img (nbdcopy uses multiple threads and is not really like a curl download). Modern NBD is a fast, efficient protocol that supports pipelining requests and spreading requests across multiple connections to the server (nbdkit). Recently we've been having debates in the upstream community about performance of our curl-based plugin. Performance has become a paramount concern for some of our users, even if the implementation becomes complex. As currently implemented, nbdkit will start many threads (16+) and dispatch NBD requests to the curl plugin from those threads in parallel[1]. There is a pool of (by default 4) libcurl easy handles, all configured in the same way. When a request comes in, it picks a free handle (or waits for one to become free) and then synchronously makes the HTTP/HTTPS request using WRITEFUNCTION/WRITEDATA + curl_easy_perform. nbdkit nbdkit-curl-plugin ---------------------> \ +--------+ ---------------------> \ | CURL* --------> web server ---------------------> | CURL* --------> web server ---------------------> / | CURL* --------> web server ---------------------> / | CURL* --------> web server ---------------------> get/put +--------+ ---------------------> handles pool threads, each doing a single NBD request + reply Observing the current plugin shows that curl is opening up to 4 TCP connections to the web server, as expected. Firstly, I don't understand if the multi interface would actually help us here. Because nbdkit gives us lots of threads and expects an NBD request to be processed synchronously on that thread, using the easy interface is a natural .. easy(!) .. fit. We could create a separate, new pool of threads, eg. one per CURL* handle, but that seems like it would add more overhead as we pass requests from the nbdkit threads to the new thread pool using some kind of queue structure. A second thing I'm unclear about with multi is whether the individual easy handles which are added are related in any way -- eg. if they all share the same TCP connection to the web server? Reading the page makes me think this is not the case, the multi interface is just a way to group easy handles for the purposes of using a select/poll or event-driven API, and apart from that there is no relationship. The third and main concern is whether we are using curl most efficiently. In particular, whether we are using HTTP/2 (and in future HTTP/3) as efficiently as we could be (eg. exploiting multiplexing). I notice that HTTP/1.1-style pipelining was removed from curl, and I suppose HTTP/2 multiplexing is meant to replace this. However since we are using the easy interface and doing everything synchronously, it's my understanding that we are not exploiting multiplexing, unless curl itself does something clever internally. Any comments on this design and thoughts on ways we could improve things are most welcome. TIA, Rich. ---------------------------------------------------------------------- Notes [1] Actually not in parallel right now because we found that fully parallel requests caused the plugin to slow down. However if you patch nbdkit-curl-plugin like this then it works the way I describe above: diff --git a/plugins/curl/curl.c b/plugins/curl/curl.c index 70c0a9ec9..47c9d7d41 100644 --- a/plugins/curl/curl.c +++ b/plugins/curl/curl.c @@ -474,7 +474,7 @@ curl_close (void *handle) * of pessimising common workloads. See: * https://listman.redhat.com/archives/libguestfs/2023-February/030618.html */ -#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL /* Calls get_handle() ... put_handle() to get a handle for the length * of the current scope. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW -- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html