Jason Dagit <[email protected]> writes: > Just curious, what is a "hard" merge? Computationally difficult or is it a > forceful merge? If someone could look into why that hangs it would be > valuable > information I suspect. I bet David would be interested in that bug report, > for > instance. Hard merge is defined as one that makes darcs hang or crash. ;)
> If you run the ghc profiler on this merge, do you know where we spend all the > time? I'll try to get to that later, but I have other thing to get to first. :| > There are likely some choices: > - abandon current darcs core altogether > > Could you clarify how you define 'darcs core' in terms of modules? Hard call. Basically all that defines things about patches. This is unfortunately not very confined within darcs. > - try to fix it ourselves > It sounds like you've looked at this the most recently, do you have any ideas > on how the fix would go? get_extra has always been a nasty little bit of > code. I've tried to document some of the assumptions I'm aware of in that > bit, > but I've always suspected it to be a little brittle. It's also crucial to > optimize that code because it is such a common code path and needs to do some > real work. The whole core is a little muddy to me. I don't expect to delve into this myself, though. > - try to merge David's work manually (and hope that he eventually fixes > the > bugs in the core we care about) > > A manual transplant (aka diff + patch) may be a good idea. Basically the only reasonable way to get there for now. > I'm not sure what you're saying :) You feel the current core is dead; okay > fair enough. But, what are you proposing beyond asserting its death? Are you > saying someone needs to rewrite it? Sort of. I am saying that (a) we need to take measures to make replacing the core feasible in mid-term and (b) that someone eventually has to rewrite it. > Rewriting the core comes with a big burden of backwards compatibility. You > could bump the repo format number (like we did with darcs2) and we could go > through the process again with a darcs 3 format. But, other approaches, like > rewriting all the core code while calling it darcs-2 format is risky because > you could end up with different semantics (before and after the rewrite) for > the same repo format. We also probably want to maintain the current code for > quite a while so that we have true backwards compatibility with previous repo > formats. Again, like the darcs-1 format to darcs-2 format transition. This will absolutely need a new, incompatible format. Part of the bugs are likely hardcoded in the "patch" format (as opposed to repo format). Things like file removals conflicting with edits, or hunks depending on adds (while adds can conflict). Both are sources of problems that are nearly intractable for current darcs. We may as well learn from those, since these have trivial fixes... (This is, hunks operating on abstract files that are never added or removed. Instead, the abstract files are added or removed from the "working" set by a separate patch type, say "incarnate <some-uuid-thingy> <path>". Conflicts on this level are very simple, and all other conflicts are confined within a given abstract file. It may involve some trickiness on the UI level, but the core part is trivial and fixes a whole lot of existing "core" issues.) Also, we probably want Camp to provide us with new core bits, at least for the things that happen inside these abstract files -- commuting and merging hunks. I have no idea what happens to the megapatches (changesets) of Camp and how we map things to darcs. I don't know how will the camp-core API look like. Most of this is an open question. Now of course, there is the question of backwards compatibility, that is basically (a) from above. This is not only about the user level, but also about code. So to preserve both user-level compatibility and developer-friendly hackability, I would propose some changes for 2.4 and maybe 2.5 horizon: We already have the "format" file, and we decide certain things upon this format. What I would like to see is to push the format distinction to a much higher level, to the point that we can have separate command implementations depending on the format in force. This will make it possible to build up a new core besides the existing one, without having to do a lot of extra work. Basically this just needs a refactor of the UI layer, and probably not even a hard one. When we have this, we can start building a new, more bottom-up library, and also use it in commands for new repository formats, without compromising backwards compatibility or stability. To get the hackability, we could move everything into Darcs.Legacy and start with a clean Darcs.* namespace (the alternative is to keep Darcs. as it is and come up with a new namespace). We would then probably move and refactor parts of the Legacy, moving them up in the hierarchy to new Darcs. modules (out of Legacy). We can have certain unit testing coverage requirements for such modules. We can of course hack on things in Legacy to improve them or fix bugs or such -- nothing is frozen just because of this move. To back the process a little with actual experience, we have done something very similar in a different (C++) project. We basically decided that a complete rewrite would be probably quite useful, but quite expensive. So we stashed away all existing code and re-used it as we could, sometimes wrapping things, sometimes refactoring things and moving them out of legacy, sometimes replacing them with new implementations. This will be a little more challenging in darcs, since for the other project, we could afford to throw away the UI, since it was mostly trivial, and write it from scratch, so it was not part of the legacy code library. But I assert this would be definitely possible. > I like the idea of the core getting attention and cleanup. In particular, I > would love to see what impact there is to switching to a left fold based io > system (the Oleg iteratee stuff). It's a massive rewrite as near as I can > tell, but I think it would give us more explicit control over the algorithms > and allow us to fine tune our performance better. My intuition is that we > could apply the iteratee approach to both streaming data from the harddrive/ > network (for parsing) and also to the task of streaming patches from the > repository (after parsing we generate patch sequences in an iteratee fashion). Well, left fold is great abstraction, but it has some issues when it comes to darcs... That is, folding is nice for streaming, but many darcs operations require random access -- this is not something I expect to translate well into iteratee-like approach. Basically, the only operations that would really take advantage of this would be those that work from the top of the patch stack linearly downwards: applying and unapplying patches, and even then only for the patch input -- the working copy needs to stay random-access. Yours, Petr. _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
