I am a long-time user of Darcs, having begun to use it in either 2003 or 2004, and with very few exceptions I have been thoroughly pleased by the experience of using Darcs, as well as by the responsiveness of the Darcs team to my quibbles in #darcs. Thank you!
Even though I have used Darcs for a long time, I am still merely a novice user, and, bizarre as this may sound, I should like to stay that way, because I believe that revision control is too fundamental in software development to require expertise by the majority of its users -- and Darcs has supported my belief marvellously. However, this may cause my questions and comments to seem naive and to use the wrong terminology, for which I beg your pardon in advance. I have two questions, prompted by Bryan O'Sullivan's two remarks about Darcs in the article <http://queue.acm.org/detail.cfm?id=1595636>. The first of his remarks was about the performance of Darcs: `Why isn't everyone using Darcs, then? For years, it had severe performance problems that made it completely impractical. These have been addressed, to the point where it is now merely quite slow.' I presume that by the `performance problems that made it completely impractical', O'Sullivan meant the problem of Darcs's exponential-time merging algorithm, the frequency of which problem I understand was reduced tremendously in Darcs 2. I am not sure what parts of Darcs O'Sullivan meant to describe as `merely quite slow', but I have always been frustrated by the performance of `darcs changes <pathname>' and `darcs annotate <pathname>' in large repositories. My understanding of src/Darcs/Commands/Changes.lhs suggests that get_changes_info uses filter_patches_by_names to go through the entire list of the repository's patches. Similarly, in annotate_file in src/Darcs/Commands/Annotate.lhs calls getMarkedupFile, whose auxiliary routine do_mark_all also appears to go through the entire list of the repository's patches. This seems highly suboptimal -- for most uses of the commands, surely they should run in time linearly proportional to the number of patches related specifically in some way to the files the user has passed to them. After some discussion in #darcs a while ago (months, perhaps a year or two), I believe Jason Dagit (lispy) told me that he had implemented some kind of on-disk cache mapping pathnames to lists of patches that could affect the files at those pathnames. I don't remember his details, but what prompted his mentioning that was my describing a very conservative cache that would track all renames and identify any files that ever had the same pathname. E.g., if I had FOO renamed to BAR, and then created another file FOO, and renamed BAR to BAZ, then both these files would conservatively be assigned the same list of patches. Such a conservative cache is safe because its purpose is only to shorten the list of patches to consider for each file, not to identify it precisely, and the cache can always be rebuilt if lost. In any case, irrespective of precisely how this cache is constructed, will any such mechanism ever be included in Darcs to reduce the frustration of waiting for `darcs changes <pathname>'? The second remark made by O'Sullivan in his article was: `Its more fundamental problem is that its theory is tricky to grasp, so two developers who are not immersed in Darcs lore can have trouble telling whether they have the same changes or not.' I am emphatically not immersed in Darcs lore, but it has always been my intuitive impression that if two repositories have no patches to pull from or push to one another, then they have identical contents. In #darcs, Simon Michael (sm) answered `yes' when I asked whether this is true. I didn't say exactly what I meant by `identical contents', because, as I said, this is only an intuitive impression. Obviously I don't mean that *everything* is the same (e.g., the preferences), but at least the state of the pristine tree and the collection of patches. If this is so, then despite what O'Sullivan says, it seems unnecessary to be immersed in Darcs lore in order to tell whether two repositories have the same state, if one is reachable from the other: if `darcs pull' followed by `darcs push' report no patches to transmit in either direction, then the repositories have identical contents. However, this is a heavy-handed test, so I wonder: Can the state of the repository be summarized in a concise string such as a hash, or dumped consistently, say, to a stream of bytes that can be piped to `openssl dgst', so that if two repositories have identical contents, then they will show the same summary (hash)? What if I said `iff' instead of `if'? Thank you, and sorry for the long message! _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
