For the scenario you describe, the idea of working in the SlurpMonad is a good one. It means you have the contents of all modified files in memory, which could be horribly expensive, so you'd only want to do this when passed a flag (--store-in-memory, or something).
I don't expect to have time to help with this much (sorry), but it'd be a nice improvement. There are a few (testable) steps one could make: (1) do all patch application in memory and then write them to disk (which saves essentially nothing, but does the SlurpMonad work in such a way that it's easily checkable). (2) Do (1), but when writing to disk, only write the files (from the Slurpy) that you actually want to diff. Pretty easy after (1), as you just need to write a special function to write the slurpy to disk. Or a function to remove all but a select group of files from a slurpy. The latter option would help (3) below. (3) Remove unwanted files from the Slurpy *before* applying patches. This saves lots of memory, but is very tricky, as it requires that the SlurpMonad not fail when modifying a file that isn't stored in your Slurpy. There are a couple of ways you could handle this. The easiest would probably be to create a new "SloppySlurpMonad", which just ignores errors involving files not existing. This new monad would be nested on the existing monad. This sounds scary, but would actually be relatively little code, and pretty easy (and fun!) to implement. Hope this helps, and that you (the original poster) plow ahead and make this optimization... David On Thu, Aug 24, 2006 at 08:48:39PM +0100, Grant Husbands wrote: > Jason Dagit wrote: > >Just wondering, what characterizes your 'large repositories'? > > I can answer this one. The main repo he's thinking of contains 2,464 > files in 148 folders, totalling 25MB. Most are text files and they > rarely exceed 20KB. There are no files larger than 2MB, and only a few > larger than 1MB. > > There are 1,504 patches; 1,200 of those are each, gzipped, smaller than > 1KB. People normally apply diffs to recent patches and the total size of > the files affected by all of the patches ahead of them would very rarely > exceed 1MB. However, the time it takes to create copies of the 25MB of > small files is appreciable. > > You don't need a repo of that size to see the effect, though. Doing a > darcs diff on a recent patch in darcs-stable takes an appreciable amount > of time, where all that time is spent copying files. > > >Another idea I had to optimize darcs was to store hunks differently so > >that parts of patch bundles could be skipped over instead of parsed. > >I partially implemented this and found that it was actually slower > > For what it's worth, this wouldn't help in our case; we normally only > diff patches near to the current state, so the time spent parsing > patches is trivial. > > I'm not really qualified to comment on any of the rest. Any pointers > would be handy, of course. > > G. _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
