On 8/24/06, Edwin Thomson <[EMAIL PROTECTED]> wrote:
Hello
I'm finding darcs's diff command to be unusably slow for large
repositories, so I'm trying to speed it up. Darcs seems to do diff by
making two copies of the repository, with different patches applied on
each, which results in a lot of unnecessary work in the normal case
where we only care about a small number of files. It would be much
better if it only touched the files it needs to.
Just wondering, what characterizes your 'large repositories'? I have
found a couple ways in which darcs could possible do a better job on
what I consider to be large repos but none of my planned optimizations
worked out.
I was looking into a similar problem a while back with 'darcs changes'
where even in summary mode it would look at the entire file for new
files instead of just saying, "Oh and BTW this file is new." Back
then I created a patch (which got lost for whatever reason) that just
detected this case and skipped the code to check the contents of the
file.
I've learned that optimizing lazy code is tricky. For instance, darcs
currently has a problem when there is a single very large file. It
seems to incorrectly hold on to too much memory. I say "seems to"
because it's not clear exactly what the memory is being used for.
Plus there are complications with the FastPackedString library where
it uses mmap (do we count the memory consumed for the mmap as used
memory or not? I'm in the camp that says, "Yeah, memory used by mmap
is a big deal and needs to be bounded at something reasonable like
maybe 64MB (runtime option)").
Another idea I had to optimize darcs was to store hunks differently so
that parts of patch bundles could be skipped over instead of parsed.
I partially implemented this and found that it was actually slower
than the current implmentation (but could maybe be amended to be
faster, hard to tell).
It looks like it ought to be possible to make patch application use a
SlurpMonad instead of the IO monad, and then pull the modifications out.
If I understand things correctly, I do something like this:
import Patch (apply)
apply_stuff_to_slurp_monad :: Stuff -> SlurpMonad ()
apply_stuff_to_slurp_monad stuff = do
...
...
apply blah blah patch1
apply blah blah patch2
then use it with something like:
get_file_contents :: OtherStuff -> [FilePath] -> IO ([PackedString])
get_file_contents otherstuff files = do
s <- slurp pristine
case (apply_stuff_to_slurp_monad stuff) >>= (MapM mReadFilePS files)) of
SM func -> func s
I'm not good with Haskell, and I don't really understand monads well, so
I'd like to know whether this approach is possible and/or sane, or
whether there is a better way, before I spend too much time failing to
implement it.
I'll I'd admit, I don't know this area of darcs well enough to comment
on the code you're suggesting. But in general it seems that improving
run-time can be done by making things very lazy with just a few key
places being strict. On the other hand, doing so can lead to memory
headaches. I think in the end the best way to find out in Hasklel is
to just try it.
HTH,
Jason
_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel