On Mon, Aug 17, 2009 at 10:07 AM, Eric Kow <[email protected]> wrote: > On Mon, Aug 17, 2009 at 09:47:26 -0700, Jason Dagit wrote: > > During darcs record I noticed that darcs needed 1.6GB of memory just to > give > > me the list of hunks and adds. I typed 'a' to accept all and darcs was > > killed by windows when it reached about 2GB of memory usage. > > > > I could be mistaken, but I thought back in the darcs 1.0.x days David > > optimized darcs for this case and you could actually add as many files as > > you wanted without running out of memory. > > Check out http://bugs.darcs.net/issue80, particularly > http://bugs.darcs.net/msg2981 in which David says we used to have > special case code for record which we lost in Darcs 2 for code > cleanliness.
Indeed. And, I also found this which refers (I think) to an older approach to solve the same problem: http://bugs.darcs.net/msg4077 I can't argue with David's logic there. Having a special case to allow you to record a patch you can never work with seems bad. > > > If I understand correctly, David says that the real solution is to > implement chunky hunks http://bugs.darcs.net/issue1357 I don't know anything about chunky hunks and I don't see them mentioned or linked in that ticket. More details can be found where? > > > This complaint just came up on reddit yesterday. I've added > a troubleshooting entry for it: > http://wiki.darcs.net/Troubleshooting#darcs%20record-runs-out-of-memory > > Meanwhile, I'd be very interested to hear about ways we can improve this > situation. For example, I wonder if there is any way to get the > benefits of chunky hunks without actually having to change the repo > format? I'm still convinced we can substantially improve this by being explicit about hunk loading. Objections include: It would make our core code less elegant and it would require a lot of coding and testing. I can see why writing this code has remained a low priority. And we'd need to require lots of testing. Anyway, to be more concrete, I think this is where we apply the left fold based IO (iteratees) ideas that Oleg et al came up with. I think we could potentially implement things like commute to only need to look at portions of a hunk at a time; but, even if we required that commute have the entire file in memory, we should still be able to implement other patch operations to just process the patches in bounded buffers. In other words, even if the largest patch we commute is limited by what we can contiguously mmap, we should still be able to optimize this case of adding many reasonably sized files. Excluding a few tiny binary files such as icons, in the repository in question all the files are "small" text files containing source code. Some of the bugs I just peeked at hinted that our handling of pending and our sorting of changes cost us dearly in the memory efficiency category. Maybe I can get a profile build on this machine and check with that. Jason
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
