On Thu, Jan 03, 2008 at 08:40:52AM +1100, James D Sadler wrote: > A 150 mb patch is far from ideal, yes but I disagree that it is an > error on our part. We can perfrom gets and pulls just fine at work, > and that means that darcs is handling that patch just fine in that > situation. It doesn't handle that patch when trying a pull that > attempts to pull that one patch *only*. Darcs consumes inordinate > amounts of RAM on order to do this - my guess is that Darcs is > scanning other patches in order to check for dependency relationships > with the patch I need to pull.
No, it's just that darcs is optimized for the common case. In the common case of pulling, the remote repository has only a relatively small number of patches that we do not have locally, and reading them all into memory makes sense. When pulling into an empty repository, this means that one always reads (and parses) the entire remote repository into memory. Darcs doesn't need to check any dependency relationships, it just grabs everything into memory so it can nicely prompt you interactively to see which changes you want. One slow but cheap (in terms of memory use) approach would be to start with a current repository and unpull patches one at a time (possibly optimizing in between). This would be fast and use little memory, since each operation is a "normal" operation. Also more efficient than pulling into an empty repository would be darcs get --to-patch, or the like. > While writing a tool a couple of months ago to extract the content of > a darcs repo without invoking darcs itself, I got quite familiar with > the patch format. Something that stood out immediately was that > patches do not contain references to the patches that they depend on. > Essentially that means that darcs has to do a *lot* of work in order > to figure out the dependency relationships - I think it is here where > it loads all the patches into RAM and uses the 'patch algebra' to make > its conclusions. This is the error, IMHO. (I am trying to tread > carefully here, it's not my intention to start a flame war :0) ) The alternative is making darcs record O(N) where N is the size of the repository. That's unacceptable. Also, it would make the disk use of a repository O(N^2) in the (uncommon) worst-case scenario. -- David Roundy Department of Physics Oregon State University _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
