On Fri, Jul 29, 2005 at 04:21:08PM +0200, Florian Weimer wrote: > Has anybody thought about improving pull performance? > > I think it might be useful to add a cache for the remote > _darcs/inventories/* and _darcs/inventory files, and use zsync to make > downloads of _darcs/inventory incremental.
Are you thinking about optimizing the "no changes" case? It seems like there are a few issues here, and I'd rather not address an optimization at the transport level if there are logical-level optimizations that could make those redundant. I.e. rather than caching to avoid transport, I'd like to avoid downloading any data we don't need. I don't see any reason why we should need zsyncish optimizations for fetching the inventory, unless perhaps the inventory is very large because there aren't any tags. And as long as the inventory is small, latency will dominate, and a simple download should beat zsync in speed. Tagging regularly and optimizing can reduce the size of _darcs/inventory, which helps as long as you don't need to delve into _darcs/inventories/. When darcs tag is run, the inventory is automatically split, but when one pushes a tag this doesn't happen. Perhaps a flag to make apply automatically optimize when it applies a tag would be helpful. In many cases, optimize --reorder can help prevent the need to delve into _darcs/inventories/. Perhaps we could consider an option to pull which would reorder to match the remote repository--this helps not just with transport, but also with the amount of commutation needed to perform the pull. We also may be able to create improved versions the get_common_and_uncommon and related algorithms which would be more asymmetric, in that they'd try to access less of the remote repository and more of the local one. This could be tricky (since those algorithms are tricky), but would be an improvement that could relatively easily benefit several commands, and could downright eliminate the need to look at any _darcs/inventories/ files that correspond to tags that we have in our local repository. > zsync is available here: <http://zsync.moria.org.uk/> > It doesn't need any special server support, it's all client-side. Well, it does require that .zsync files be built (or rather stored) on the server, containing checksum information. Darcs would have to be in charge of updating the .zysinc files... but there's no reason that should be a problem. A related idea would be to leverage the proposed hashed inventories (which would of course have to be implemented) to (optionally) store a cache of all _darcs/inventories/* and patches in a centralized location. With hashed inventories (also useful for signed repositories) we'd then be able to avoid ever downloading the same patch or inventory file twice. This wouldn't handle the large _darcs/inventory issue, but would essentially eliminate the cost of downloading _darcs/inventories/ (since those files very rarely change). (And I keep hoping that someone will implement the hashed inventories idea... I'd help with design, critiquing and especially with the RepoFormat code to enable forwards and backwards compatibility.) Back to the subject of pull, if you have a particular "common case" scenario that you think could use improvement, I think it'd be helpful to discuss that case in detail before deciding what would be the best way to improve darcs' performance. -- David Roundy http://www.darcs.net _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
