Hello, I work at Protocol Labs on the IPFS Stewards Team, on the Go implementations and have done some specs rewriting. (GH Profile https://github.com/Jorropo/)
Mark recently added ipfs:// support in the curl CLI <https://github.com/curl/curl/commit/65b563a96a226649ba12cb1ec7b5c4c538ec1c08>, however sadly it does not perform validation on the data received. I am interested in fixing that as well as moving the support into libcurl. I already wrote some PoC in Go: https://github.com/Jorropo/go-featheripfs Which I'm productionising this code in our Boxo go library: https://github.com/ipfs/boxo/pull/347 I have questions about how I should go about it on 2 points. # How should a libcurl protocol go about reusing another libcurl protocol ? What I implement is the trustless gateway protocol <https://specs.ipfs.tech/http-gateways/trustless-gateway/> this is a request streaming response protocol over HTTP. I do point to point impl, no tricky P2P or concurrent download thing. To consume HTTP in the IPFS protocol I have created an IPFS statemachine struct which I have added to the SingleRequest.p union. This object itself contains a CURL* field of the userfacing libcurl API (I will probably have to move CURLM* but I didn't get to this problem yet). Is it acceptable for libcurl internals to use the public libcurl API ? Given I have to implement Curl_handler myself I could see a world where targeting Curl_handler_http makes sense as I can forward my implementations of the Curl_handler methods to Curl_handler_http with the required wrapping added. However url.c (and more ?) does preprocessing which is nice for me and having two code paths to reach HTTP would lead to duplicated code and smell breeding grounds for bugs, it also is not as easy just forward to http since sometimes I need HTTPS, H2, H3, ... This could become tricky based on the semantics we want to attach, for example some parameters like proxying, headers (if the user wants to add auth token, but not Range as this handled by IPFS), --insecure, ... should really be passed as-is to the existing http* stack. Is there a third better option I overlooked ? Also it would be nice if whatever solution allows to do more than one HTTP request, the state machine can use the DFS merkle-tree stack to do resumption of downloads so you can imagine the user supplying multiple IPFS gateways and the code could transparently restart streaming with another server exactly where it left off with the previous server if an IO error happen. # Where should the true "IPFS" IPFS code live ? Concretely the implementation could be split in 3 parts: 1. Decoders - Multibase <http://github.com/multiformats/multibase> (1 char prefix, followed by hex, base32, base64, ...) - Unixfs <https://github.com/ipfs/specs/pull/331> protobuf which is the merkle-tree we use to encode unix-like objects. - I've already been asked to also support cbor encoding. - Multihash <https://github.com/multiformats/multihash> (prefix which maps to some hash function), currently it's not really a thing in my curl code because I only support sha256 to dodge codesize debates about adding more hash functions. In the case this grows, it is a switch statement from one ids to a hash function. 2. Car <https://ipld.io/specs/transport/car/carv1/> Decoding unixfs Validation state machine, this reads the incoming stream from the server, parse blocks, maintains a DFS stack of future blocks expected, this reads 1 block (maximum 2MiB in IPFS world currently) and copy decoded data to the consumer once validated (repeat until the stack is empty) 3. Interfacing with the underlying HTTP library. I am not thinking about reusing this code in other projects which don't use libcurl right now so this is not a concern. It is a maintenance issue, is having this code in curl acceptable ? I can pledge time to maintain the IPFS parts in curl if time needed (fix bugs and help on reviews). If not, how should I go about it ? Write, maintain and distribute my own .h and .so which curl can target ? I'm scared by the absence of a C module management story (I'm used to the exceptional Go modules) and I've heard more than once: "we just use libc, openssl, zlib, libcurl, ... (stable librairies a huge share of linux systems already ship with) because adding anything else makes devops too hard". Note: I say .so not .c because if I would maintain my own separated greenfield library I would like to use Zig instead of C. I don't need an answer right now on this one, it is fair to make a decision once I have a pull request up. Thx
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html