On Fri, Dec 21, 2007 at 04:12:49AM +0300, Dmitry Kurochkin wrote: > I have completed initial work on libwww pipelining. Output of darcs whatsnew > is attached (sorry for that, I will try to make a proper patch tomorrow). > What is done: > - libcurl functionality is implemented using libwww. Now pipelining works. > - New Libcurl module provides 3 functions: > * copyUrl - same as copyUrl from Curl.hs. It uses copyUrls and waitNextUrl. > * copyUrls - takes (filename, url) list, creates requests and adds > them to libwww. Does not load anything. > * waitNextUrl - starts libwww event loop and blocks until first url > loads (or error happens). After it returns it should be possible to > add more urls to queue using copyUrls again. waitNextUrl should be > called as many times as urls are in the queue.
Thanks for this contribution! I've finally gotten around to writing the promised configure support for this, and it looks pretty nice, particularly as a starting point for an internal API that we can use (and hopefully which can also be supported through the curl multi API). I've got a couple of suggestions/questions, now that I've had time to look at the actual code. How hard would it be to make a function waitForURL :: String -> IO () which ensures that we've already got the given URL. This would allow us to speculatively call copyURLs to grab stuff we expect to use later, without keeping track of the order in which they were queued (so as to call waitNextUrl the proper number of times). I think this would be a real improvement. Related to this would be a feature to ignore duplicate calls to copyURLs. This may not be supported by libwww itself, but it'd be really handy, again for speculative triggering of downloads. Also related to this idea: can we adjust the order of downloads in the queue? e.g. maybe I'd like to add a file towards the front of the queue because I need it right now. This might be doable if waitForURL could bump up the priority of that URL, in case it hasn't yet been requested from the server. I'm thinking of situations like this: We're doing a darcs get. This involves grabbing all the inventory files and all the patch files from the server. Each inventory file has pointers to many patch files and the next inventory file. We don't know how many patches there are in the repository until we've downloaded (and read) all the inventory files. We could get the inventory files sequentially with no pipelining, count the patch files, and then grab the patch files with pipelining and providing nice feedback. But this is a bit ugly: we waste all that time while grabbing inventory files and waiting for the entire latency, when we already know where a whole bunch of patch files are that we could be grabbing. So a faster alternative would be once we have the first inventory file to queue up the second inventory file and also all the patch files listed in that inventory. Then when we get the second inventory, we queue up the third inventory and all the patch files in the third inventory, etc. This is ugly (with the current API) because we won't get the last inventory until we've already downloaded almost all the patch files. It's very fast (everything is pipelined), but because we've got a FIFO queue, the third inventory can't be grabbed until we've already gotten all the patch files from the first inventory, so we can't give nice feedback counting the number of patch files we've got versus the total number. Which is why it'd be nice to be able to prioritize the inventory files that we're waiting on, so that we queue up the second inventory followed by all the patches listed in the first inventory, but then when we get the second inventory, we slip the third inventory in at the head of the queue. So we get all the inventories pretty quickly (although probably not as quickly as if we took the first approach) and we're also interleaving the downloading of patch files, keeping the pipe full (in theory, anyhow). > At the moment the only place where copyUrls is used is get command. > But I hope this interface > is enough for Darcs. If not - we need to think of smth more complex. > Waiting for comments here. Hmmm. I think comments are above. It's actually not a bad interface as is, but seems waitNextUrl seems a bit awkward to use. Actually, it has now occurred to me that we could implement waitForURL as a wrapper around waitNextURL, if we kept tabs on what had been shoved in the queue. It seems a bit ugly, but we could live with that sort of solution, if libwww doesn't have this functionality. > What is missing: > - DARCS_PROXYUSERPWD is not used (but http_proxy works). > - Proper error handling. > - Not tested. > - ??? These issues are somewhat less critical now that this can coexist with the libcurl code. Only interested users are likely to use the new code, so it'll have a bit of time to mature. I haven't yet done any performance testing myself. That comes next (and requires using my laptop, since the network of my work computer is too fast for this to have a noticeable effect, as far as I can tell. I expect I'll be applying this soon to the unstable repository. David _______________________________________________ darcs-devel mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-devel
