For the scenario you describe, the idea of working in the SlurpMonad
is a good one.  It means you have the contents of all modified files
in memory, which could be horribly expensive, so you'd only want to do
this when passed a flag (--store-in-memory, or something).

I don't expect to have time to help with this much (sorry), but it'd
be a nice improvement.  There are a few (testable) steps one could make:

(1) do all patch application in memory and then write them to disk
(which saves essentially nothing, but does the SlurpMonad work in such
a way that it's easily checkable).

(2) Do (1), but when writing to disk, only write the files (from the
Slurpy) that you actually want to diff.  Pretty easy after (1), as you
just need to write a special function to write the slurpy to disk.  Or
a function to remove all but a select group of files from a slurpy.
The latter option would help (3) below.

(3) Remove unwanted files from the Slurpy *before* applying patches.
This saves lots of memory, but is very tricky, as it requires that the
SlurpMonad not fail when modifying a file that isn't stored in your
Slurpy.  There are a couple of ways you could handle this.  The
easiest would probably be to create a new "SloppySlurpMonad", which
just ignores errors involving files not existing.  This new monad
would be nested on the existing monad.  This sounds scary, but would
actually be relatively little code, and pretty easy (and fun!) to
implement.

Hope this helps, and that you (the original poster) plow ahead and
make this optimization...

David

On Thu, Aug 24, 2006 at 08:48:39PM +0100, Grant Husbands wrote:
> Jason Dagit wrote:
> >Just wondering, what characterizes your 'large repositories'?
> 
> I can answer this one. The main repo he's thinking of contains 2,464 
> files in 148 folders, totalling 25MB. Most are text files and they 
> rarely exceed 20KB. There are no files larger than 2MB, and only a few 
> larger than 1MB.
> 
> There are 1,504 patches; 1,200 of those are each, gzipped, smaller than 
> 1KB. People normally apply diffs to recent patches and the total size of 
> the files affected by all of the patches ahead of them would very rarely 
> exceed 1MB. However, the time it takes to create copies of the 25MB of 
> small files is appreciable.
> 
> You don't need a repo of that size to see the effect, though. Doing a 
> darcs diff on a recent patch in darcs-stable takes an appreciable amount 
> of time, where all that time is spent copying files.
> 
> >Another idea I had to optimize darcs was to store hunks differently so
> >that parts of patch bundles could be skipped over instead of parsed.
> >I partially implemented this and found that it was actually slower
> 
> For what it's worth, this wouldn't help in our case; we normally only 
> diff patches near to the current state, so the time spent parsing 
> patches is trivial.
> 
> I'm not really qualified to comment on any of the rest. Any pointers 
> would be handy, of course.
> 
> G.

_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to