Hi, a friend of mine is working on "modernizing" Apt of debian. Apparently there were running some obscure hand-written HTTP code resulting in very low performance. Not to mention potential security issues and lacking features. He therefore now rewrites Apt to use curl. Apt uses a Hash function (MD5) to verify its downloads. In the existing curl codebase there is already the Metalink implementation which also does hash verification _after_ downloading a file. However hash functions could be fed while downloading. Especially when the operating on big files, this increases performance (in terms of "not waiting for md5sum after the download") dramatically. So on the command line this is trivial:
curl $(URL) | tee download | md5sum Also with the curl library this is trivial, just using the WRITEFUNCTION/WRITEDATA callback to feed the hash function like in the getinmemory example shipped with curl. However in order to do this one needs to link against a library providing a hash function. Curl however already has such dependencies and even has a small abstraction layer for Md5. However this is not exported and only used internally. Projects like Apt would depend on curl which in turn would depend on a TLS library (in Debians case GnuTLS). When implementing the MD5 hashing one would need to make use of hash function from this crypto library possibly by copying the curls abstraction layer over many TLS/crypto libraries yet again. It is noteworthy that this copy'n'pasting already happened inside curl to some extend: lib/md5.c src/tool_metalink.c (albeit abstracting over more hash functions) While looking into this I also noticed that the metalink code does the verification _after_ the download, which Daniel also mentions [0]. In the mentioned RFCs about the headers and XML format I found no mention of the time of the hash processing. Why not do it while downloading? Should we either export the awesome abstractions curl offers for hashes or possibly also TLS (the VTLS layer) to outside? Should we add HASHFUNCTION to CURLoption, so curl would automatically compute the hash for a download while downloading? (This would be somewhat easy I figure) Shouldn't the metalink implementation make use of the MD5 abstraction already in place? One way or the other, to make Debians Apt less horrible, one would like to have hashing while downloading. Regards, Leon [0] http://daniel.haxx.se/blog/2012/06/03/curling-the-metalink/ ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
