On Mon, Aug 17, 2009 at 10:07 AM, Eric Kow <[email protected]> wrote:

> On Mon, Aug 17, 2009 at 09:47:26 -0700, Jason Dagit wrote:
> > During darcs record I noticed that darcs needed 1.6GB of memory just to
> give
> > me the list of hunks and adds.  I typed 'a' to accept all and darcs was
> > killed by windows when it reached about 2GB of memory usage.
> >
> > I could be mistaken, but I thought back in the darcs 1.0.x days David
> > optimized darcs for this case and you could actually add as many files as
> > you wanted without running out of memory.
>
> Check out http://bugs.darcs.net/issue80, particularly
> http://bugs.darcs.net/msg2981 in which David says we used to have
> special case code for record which we lost in Darcs 2 for code
> cleanliness.


Indeed.  And, I also found this which refers (I think) to an older approach
to solve the same problem:
http://bugs.darcs.net/msg4077

I can't argue with David's logic there.  Having a special case to allow you
to record a patch you can never work with seems bad.


>
>
> If I understand correctly, David says that the real solution is to
> implement chunky hunks http://bugs.darcs.net/issue1357


I don't know anything about chunky hunks and I don't see them mentioned or
linked in that ticket.  More details can be found where?


>
>
> This complaint just came up on reddit yesterday.  I've added
> a troubleshooting entry for it:
>  http://wiki.darcs.net/Troubleshooting#darcs%20record-runs-out-of-memory
>
> Meanwhile, I'd be very interested to hear about ways we can improve this
> situation.  For example, I wonder if there is any way to get the
> benefits of chunky hunks without actually having to change the repo
> format?


I'm still convinced we can substantially improve this by being explicit
about hunk loading.  Objections include: It would make our core code less
elegant and it would require a lot of coding and testing.  I can see why
writing this code has remained a low priority.  And we'd need to require
lots of testing.

Anyway, to be more concrete, I think this is where we apply the left fold
based IO (iteratees) ideas that Oleg et al came up with.  I think we could
potentially implement things like commute to only need to look at portions
of a hunk at a time; but, even if we required that commute have the entire
file in memory, we should still be able to implement other patch operations
to just process the patches in bounded buffers.  In other words, even if the
largest patch we commute is limited by what we can contiguously mmap, we
should still be able to optimize this case of adding many reasonably sized
files.  Excluding a few tiny binary files such as icons, in the repository
in question all the files are "small" text files containing source code.

Some of the bugs I just peeked at hinted that our handling of pending and
our sorting of changes cost us dearly in the memory efficiency category.
Maybe I can get a profile build on this machine and check with that.

Jason
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to