On Fri, Aug 05, 2005 at 04:38:04PM +0200, Florian Weimer wrote: > * David Roundy: > > I.e. rather than caching to avoid transport, I'd like to avoid > > downloading any data we don't need. I don't see any reason why we > > should need zsyncish optimizations for fetching the inventory, > > unless perhaps the inventory is very large because there aren't any > > tags. > > My benchmark is John Goerzen's fptools repository (created from > fptools/GHC CVS, see <http://darcs.complete.org/fptools/>). > > Your suggestion seems to imply that I wouldn't have to download 4 > megabyte of inventory data if John tagged his repository regularly. > Is this true? Below, you mentioned something about push not splitting > the inventory, would this be relevant in this case?
Ah, this is an automatically generated repository. If he tagged it regularly and ran optimize on the main server, there wouldn't be a large inventory to download each time. It looks like the fptools repo has been tagged and optimized already sometime before July 12, unless the dates on the server are wrong, the slowness you're seeing may be due to some other issue. Are you pulling into an unmodified repository, or one with local patches? It may be that you need to run optimize locally, or even optimize --reorder, in order to benefit from the inventory-splitting. If neither of these solves the problem, we've probably got an inefficiency somewhere that can be improved. Even if one of them does solve the problem, we still *ought* to be able to avoid downloading the 4MB of old history, since you already have all that information locally. The code to be improved is most likely in Depends, and improving it would also help us better handle --partial repositories. Ideally, we'd have functions like get_common_and_uncommon which are smart enough to avoid one of the two sides when possible, but if necesary to go in the other direction. The problem of course is that it's hard to know where the best tradeoff lies (i.e. how much commutation of local patches is worth the trouble to avoid downloading more inventory... especially since there is no cheap way to determine how much commutation will be required). The other side of the solution would be to implement hashed inventories, which will make (optionally) caching both remote patches and remote inventories cheap and easy. -- David Roundy http://www.darcs.net _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
