On 04/10/2018 08:34 AM, Stephen J. Turnbull wrote: > Ben Franksen writes: > > Am 29.03.2018 um 10:08 schrieb Stephen J. Turnbull: > > > Internally we do use references, similar to git (we refer to patches, > > inventories, and trees via content hash). But in contrast to git, these > > are not exposed as a user visible concept. Tags are somewhat special; > > they do serve to identify versions, i.e. what git uses refs for. But > > since their behavior is specified in terms of patch dependencies, they > > are not really an exception to the rule. > > I think you're taking the implementation too seriously.
That comes from talking too much about git ;-) > Any user who > understands what a ref is will say "a Darcs tag is too a ref!" I > think. Perhaps (but you won't, right?). > > > I would think "link 'em all" is a better default for most projects, > > > except that in git branch refs are really lightweight, so developers > > > are likely to have a bunch of obsolete or experimental branches lying > > > around that you don't want. > > > > Good point. I was thinking about "official" branches only, not > > experimental/feature/whatever branches that anyone can and does create > > all the time. > > How do you identify "official"? I can't, unless it's an "official" repo to start with (e.g. http://darcs.net/) and then I would assume that all branches are "official" (assuming darcs had branches). > I think that by now the great > majority of git users use it because their projects are hosted on > GitHub or GitLab, and mostly it's obvious what the "official" branches > are. Yes. > But the maintainers of git, who target themselves in designing > new features for it, are very much peer-to-peer sharers, cloning or > pulling branches from a wide assortment of repos. This is what > "distributed" means to them. I doubt they'd be willing to make > "export all branches on clone" a default, and it's not clear to me > that the "I just want to see the mainline" aren't the majority. How do they identify "the mainline"? > > > This is how Subversion works (and CVS before it and Bazaar > > > "lightweight checkouts" after it). With that restriction, distributed > > > development is painful. Avoiding that restriction is why Arch, > > > BitKeeper, git, Mercurial, Monotone, Bazaar, ... were developed. > > > Darcs, too. :-) > > > > I don't understand. What has distributed versus centralized to do with > > it? I'd say in a centralized system there is only one "remote", so the > > question is moot. Is that what you mean? > > No, it's not. By distributed *development* (as opposed to distributed > VCS), I mean a situation where multiple people are updating a single > mainline. Even with distributed VCS, there's normally an "official" > repo with an "official" master branch in it. Indeed. At work we use Darcs for development of several medium sized control systems for scientific instruments. The largest has about 100000 process variables and is distributed over almost 100 I/O controllers. Patches are pushed to the central repo by 14 developers. We almost never have conflicts because people tend to work on different parts. I usually have a hand full of "feature branches" locally, where I keep stuff that isn't ready for production; but for minor changes I just use an up-to-date clone of the central repo (i.e. one where I have no local patches and where I just pulled), make the change, record it, and push. (Note that push does not mean it goes into production immediately, there is a separate distribution step. This process also records a snapshot of the state of the source tree using 'darcs log --context', which is occasionally useful for forensics.) It is an extremely smooth process and the way we use it really is "distributed development". > This means that from an > individual developer's point of view, the state of master is a triple: > (1) what's actually in the official repo (unknown; another dev may > have updated), True. (Though it is easy to check if this is the case (hg incoming, darcs pull --dry-run, git <whatever>)). > (2) what's recorded in your workspace's metadata, and > (3) what's actually in your working tree. > A centralized VCS doesn't allow you to commit unless (1) == (2). A > distributed VCS does. But I think a lot of users' intuitions are > informed (though not fully determined) by this constraint that (1) and > (2) are supposed to be "the same". Perhaps people who have worked with CVS or Subversion for years, but certainly not people familiar and comfortable with Darcs. > > This is all uncontroversial IMO and has nothing to do with the > > question we are discussing. > > This makes me unsure what you think the questions we discuss are. I am unsure myself because I have lost some of the context. > I > think that one question is about how in a DAG-based system it's always > possible to identify all past states of all relevant upstream branches > (although you may not be able to recover the names after merges), > whereas in Darcs this is fuzzy (you can only do this for tags). I think this is correct. > > > Darcs avoids all this by modeling a branch as a history-less set of > > > patches. Of course the semantics of text require certain implicit > > > dependencies (you can't delete a line that doesn't exist in the text). > > > > (there are systems that do allow that, but not Darcs) > > token-replace has an analogous aspect. But I didn't mean "allows you > to specify deletion of a line that doesn't exist", I mean that it's > useful for some kinds of patches to obey such contextual constraints > because people think of them that way. Yes. > > I am not sure I want "semantic" dependencies. The best a general > > purpose text based tool can give you is a crude approximation of > > it. > > Of course, a semantic VCS would require a very large amount of > "knowledge" of the language(s) used in the "document". Such a VCS > could reasonably be called "AI" in the classic sense of passing the > Turing test in a limited domain. > > > The version (DAG) based systems approximate on the "safe" side: any > > change, even the smallest, semantically irrelevant one, introduces a > > dependency. > > > > Darcs chooses to err on the "flexible" side: by default only the minimum > > (technically necessary) dependencies are introduced. > > I don't think this comparison is entirely accurate. All DAG-based > systems permit cherry-picking and rebasing, although those like > Mercurial and Bazaar do try to deprecate rebase. In git they are > first-class operations. Cherry-picking is an attempt to get the effect of patch commutation without paying the price. You get what you pay for: an ad-hoc solution that may or may not give you the results you expect. Making sure the results are what you expect is tedious and error prone and I understand if people are nervous about it ("untested versions, gaah!"). > By the way, it's never been clear to me that patch algebra is more > effective than the brute-force "try a cherry-pick or rebase and see if > it works" approach of DAG-based systems. Leaving aside the > meta-patches (file renaming, creating empty directories) where git is > clearly deficient, does patch algebra allow you to avoid some > conflicts that would occur in a DAG-based system? Some, yes. It depends a lot on the foundation i.e. the concrete implementation of your patch algebra. It also depends on how conflicts are detected in the DAG based system. The simplest example with current Darcs is a replace and a hunk which usually do not conflict even if the hunk overlaps with the replace. I am pretty sure there are other examples including only hunks but don't have one ready to present it. But that is not the main point. The main point is that the patch algebra frees you from having to worry about history, /except/ when it is relevant, i.e. when patches have dependencies. > If not, what is the > great advantage of patch algebra from your point of view? Is it > related to the ability to claim the same branch identity for two > workspaces that "haven't diverged too much", where a git rebase in a > published branch all too often results in an unusable mess of > conflicts? Well, my experience tells me that "an unusable mess of conflicts" can happen with Darcs in just the same way. It is an interesting question. I think the advantages of the patch algebra are not something that you normally observe directly. Rather they are the underlying fundament which makes it possible to view a repo as "a set of patches". As I said above, this frees you from having to worry about ordering of changes when the order is irrelevant. My (linear) history may be different from yours, but the patch algebra guarantees that for the intersection of (the patches in) our two repos the order of the patches doesn't matter! The only difference that matters is the symmetric difference i.e. the patches you have but I don't and vice versa. When i pull a patch from your repo and it doesn't conflict, I have enlarged the intersection and reduced the (symmetric) difference. When I repeat this, and also push, and everything merges cleanly, then our repos are semantically identical, period. I just don't have to care about the order, either one is fine. [I am dropping the parts about remotes in git, I think the important things have been said.] > > and I am using that feature in practice. I am pretty sure the Darcs > > model would scale to a large number of developers but I have no > > proof. > > I'm not sure what you mean by "model" and what you mean by "scale". > The problem with repo per branch per version is just multiplication, > when you realize that every repo of every developer is a different > version. Storage blows up, the naming conflicts will be frequent > unless you're willing to endure network outages and delays, and URLs > for personal repos are often long and/or unintuitive. Yes, storage blow-up is a problem, and another one is discoverability, which is why I want to add branches to Darcs. I don't understand what you mean with "naming conflicts will be frequent". > > > If there are multiple people with push permission, your *VCS* > > > will need a conceptual way of referring to content that is > > > intended to end up in the "official" branch that diverges from > > > other content also destined for that branch (or already > > > incorporated in that branch). > > > > Of course. But then, assuming I do not want to push changes to "master" > > because this is how the project is organized, then I just don't do it, > > right? > > Long experience shows that is easier said than avoided. ;-) I never had this problem but YMMV. > This is > why git has an "url" variable for each remote used for fetch and push > by default, but also provides "pushUrl" in case you want to > differentiate the destination by operation. Okay. > > > > I should rather have created my own branch and committed there, so > > > > the remote owner of the branch can integrate my changes with a > > > > merge? > > > > > > I'm not saying "you should", I'm saying "you do". In a DVCS, by > > > committing locally you *do* create your own branch. > > > > Yes, exactly. I took all this for granted, which is why I asked "so > > what"? > > Rebasing, which doesn't screw up history in Darcs the way it does in a > DAG-based system because Darcs deliberately doesn't keep history. > Despite what you say you take for granted, this allows Darcs users to > identify the remote branch with their own. Darcs conforming to users' > expectations in this way may be a good thing in practice, but I think > this expectation is one of the main causes of inability to grasp git. This is quite possible. > (That may or may not be a problem, depending on which projects you > want to contribute to, of course!) > > > With branches, things may be different. It may make sense to have push > > behave in a "safe" way by default, that is, create a different remote > > branch. > > This practice was a huge annoyance with the default set up of > Mercurial (as of the time XEmacs converted to Mercurial). I would > never recommend it: most such pushes would just get lost. In git, > though, you always push to a specified branch (which may be implicit > in a .git/config variable). Yes, I tend to think this is the right way. > > > Only because you don't have multiple branches in one repo, so URL of > > > repo == name of branch == only ref that ever matters to you, and it's > > > mostly trivial to keep track of "here vs there". > > > > Yes, there is some truth to that. I would very much like to retain this > > conceptual simplicity even when/if we add branches to Darcs. I think > > that if we use the current model as a guide, then we can achieve that: > > > > A URL+branch in Darcs-with-branches behaves like a URL does now. A > > branch alone is short for "the local repo"+branch. A URL alone means > > URL+"default branch", where "default branch" is the name of the branch > > you are on, unless configured otherwise. Everything else remains as it is. > > This is what happens in git now, except that you are able to set your > own defaults in .git/config, and provide aliases for URLs (the > remotes). You can argue that remotes provide more confusion than > convenience if you like, but several years of experience have shown > that for the vast majority of git users it's the other way around. You sound so confident when you say that. As if the git we have today was the result of incorporating years of user feedback. OTOH you keep telling me that git is the way it is because the developers have mde and still make it for their own good, primarily. And that the UI is more or less frozen because Mr. T. said so many years ago. > Whatever confusion is experienced due to remotes, the convenience > gained is much greater. If you say so... > This is not true for branches. "Colocated branches" (ie, the many > branches per repo model) do seem to cause confusion. My guess is that > a Darcs-with-branches would have the same problem. I hope we can avoid that. > > Another point where it is problematic to transfer concepts naively. Yes, > > in a way this is what obliterate would do, more specifically 'darcs > > obliterate --not-in-remote'. This is not something I would associate > > with "rollback" in the transactional sense, even though i admit that > > technically it is. (My view of transactions is that they are short-lived > > deviations from the one-state-for-all norm.) > > In context, "short-lived deviation" is exactly the sense I meant: in > case of a merge with way too many conflicts, you want to "rollback" to > the pre-merge state. But doesn't this loose the changes you made? I have in the past abandoned branches I made when working on Darcs, but never because of conflicts, but rather because I found that what I tried didn't work out as planned. In the situation where I have complicated conflicts, I usually use 'darcs rebase' to resolve them one patch at a time. The work-flow is like this: you say 'darcs rebase pull', which suspends any local patches that conflict with remote ones. The suspended patches form a sort of stack (similar to Mercurial's mq extension, but with 'real' Darcs patches, so re-orderable) and you can now 'darcs rebase unsuspend' them one at a time, resolve conflicts, then amend. Re-ordering of suspended patches is subject to the normal Darcs commutation rules, so dependencies may constrain the order in which you are allowed to unsuspend. My experience is that it is much easier to resolve complicated conflicts in this step-wise fashion. > > I see. In my view of things this is yet another point where the Darcs > > model gives you an advantage without any additional effort. > > Because patch algebra allows you to compute that there *will* be a > conflict without applying any patches? That is not what I meant (although it does matter). I just meant that the "rollback" is something you get without the need to "tag" the point where you started. > But I recall lots of cases > where I'd get a nasty conflict in the workspace that I wanted to back > out of in Darcs. obliterate would require a checkout in those cases > to restore sanity in the working tree (I believe that was implicit). > That's no different from git. Yes, the traditional way (i.e. w/o 'darcs rebase') is pretty similar. Assuming you allowed conflicts and had no unrecorded changes in the working tree, the simplest way to back out of too many conflicts is to 'darcs revert --all' (this is akin to 'git reset --hard') and then obliterate the patches which have conflicts (by default this will be patches you just pulled from the remote repo) until the repo is free of conflicts. If you had unrecorded changes you are out of luck: they will be mangled with the conflict markup and there is no way to separate them. This is why I recommend to set --no-allow-conflicts as the default for push /and/ pull, so you need to explicitly say --mark-conflicts (and hopefully remember to record any unrecorded changes). > > I have no problem with a tool that is powerful and allows me to play all > > kinds of tricks, as long as these tricks don't violate the internal > > invariants that hold everything together. > > git is internally invariant, I thought (rather: hoped) so. > with the single exception of git-prune, > and git-fsck and git-gc which call it under certain circumstances. Okay. > > In git, when you rebase, it is the user's responsibility to ensure > > consistency, and humans are notoriously bad at things like that. > > Sigh. This simply isn't true. *The DAG is immutable.* Ah, I never doubted that the DAG remains consistent in itself. What I meant is the consistency of the changes to your tree. For instance, if you use cherry-picking to re-order changes, can you be sure that after picking all the commits in a branch the resulting tree will be the same as in the original? I don't think so. > > Ugh. Three branches. > > There are *always* at least three branches: remote repo, local repo, > working tree. This is just as true of Darcs. Darcs just makes it > easier to maintain the fiction that they're all near-identical > versions of "the same thing", by (1) abandoning consistency of > temporal history with the underlying object database, and (2) > requiring that persistently different branches be given their own > repositories. > > (1) and (2) are a very good tradeoff for many users. My guess is that > they would turn out to be a bad bet if you tried to manage the Linux > kernel or GCC development with Darcs. With Darcs as it is: no, that would not work. Assuming a modernized version of Darcs with in-repo branches, better (guaranteed to be efficient i.e. polynomial, ideally linear) conflict handling, and a more efficient representation of binary hunks: yes, I think this would be possible and would actually work better than git. > (Let me say here that I think > the GHC switch was an unfortunate historical accident, not evidence > for this guess. I think GHC is big enough to test scalability of the > Darcs model, and was then, but that's not why they switched AIUI.) No, that had to do with deficiencies of Darcs' conflict handling (they used darcs-1 format) and with other severe performance problems, many of which have been solved in teh meantime (but not all, sigh). > > > > I also detest that I have to register remote repos locally in > > > > order to refer to them in commands, giving them some arbitrary > > > > local name, when they already have a perfectly good > > > > universally valid name (the URL). > > I disagree that the remote's URL is a *perfectly* good name, because > the version it refers to is *unstable*. In git, you know that "git > diff origin/master" will give the same result every time, until you > fetch that branch. In repo-per-branch models, you don't know that, > because somebody may commit in the other repo. You are right that what the URL denotes is unstable in this way. A good point that I did not understand at first. I still find it interesting that in Darcs I never missed remote tracking branches yet. That said, I do normally have one local branch (clone) where I make no changes but just pull patches from my "feature branches", for instance when I recorded unrelated changes on them and want to send/push them separately. But I normally don't see the point in remembering exactly where I was when I made that clone, instead I freely pull from the mainline before I apply my own changes, in order to avoid conflicts. I guess the work-flow with Darcs is just different enough that some concepts (or problems) simply do not transfer naturally. > > """ > > When a local branch is started off a remote-tracking branch, Git sets > > up the branch (specifically the branch.<name>.remote and > > branch.<name>.merge configuration entries) so that git pull will > > appropriately merge from the remote-tracking branch. This behavior may > > be changed via the global branch.autoSetupMerge configuration flag. That > > setting can be overridden by using the --track and --no-track options, > > and changed later using git branch --set-upstream-to. > > """ > > > > I am getting headaches from this. I think it means (but I am far from > > sure) that to get the behavior I want, I should checkout a remote > > tracking branch and then start a local branch from that? > > In fact what it means is that you normally don't need --track because > that's the default, and you don't need to checkout the tracking branch > (you only do that if you want to look at the corresponding working > tree). You just need to define the remote: > > git remote add ben http://ben.net/~ben/public-repos/repo-of-interest > > and then the local branch: > > git branch somebranch ben/somebranch Okay, thanks. > > > git users usually have a bunch of obsolete or experimental > > > branches lying about, that you would not be interested in > > > tracking. > > > > Granted. So how does git know which branches you are interested in and > > which not? Simple: you are (supposed to be) interested in whatever the > > remote has named "master". No? > > No, that default is only for a clone, and it's whatever is checked out > in the source repo, which is usually "master" for a public repo. But this is horrible. "Whatever is checked out in the source repo" is completely unpredictable (unless you make sure it is a bare repo so nobody would checkout anything there). > > For personal repo where a developer uploads all the stuff he/she is > > working on, you could clone the one branch you are interested > > in. (I don't know if you can clone a branch in git.) > > You can. Like "shallow clones", in modern git there's really only a > point if you are space- or bandwidth-constrained. I thought so. > > What about the sharing with colleagues? (Of configuration changes or new > > features or fixes that aren't ready for upstream.) As I understand, in > > your work-flow these are all either local branches in a repo in your > > home dir, perhaps on your own computer. Or else, you push them for all > > the world to see in a branch to the upstream repo. Both of which aren't > > ideal IMO. You really want a third repo in between upstream and local > > for that. > > Yes, as I describe above these days it's typically on GitHub. Unacceptable in many companies. Also unnecessarily slow, etc etc. > > In git this must be a bare repo, so you cannot and aren't supposed > > to work in it, right? > > I guess; in practice on GitHub you can't work in it. I suppose > setting it up as a bare repo does help prevent "wrong cwd" boo-boos. And clones where you get whatever branch the developer has just checked out. > > > Well, "origin" *is* an URI, relative to the local repository, if > > > you're in one. > > > > This is a contradiction in terms. The 'U' in URI stands for > > 'universal'. > > URIs come in relative and absolute forms. "origin" is a relative > form. Let's drop this and agree to disagree. > > I did that a few days ago, because setting up the remotes correctly is > > just too much hassle for me. > > If you say so. "git remote <alias> <url>" is all I've ever needed, > though. I will remember that. > > Whenever you call something "immediately plausible" in git, it > > feels to me like we live on different planets. For instance, here > > you refer to "a commit's *content* object" and I have only a vague > > idea what that is. > > The content object in a traditional git repository is just a tree > object (representing the top directory in the working tree). In a > submodule, that can be a commit which comes from a different > repository (that of the submodule's project). Okay, this is how I understood it. > > You said earlier that git represents a submodule as a tree object > > that is itself a commit. But it cannot be the commit that > > represents the current (pristine) tree in the submodule, else I > > could not make a commit in the submodule (or pull there) without > > makeing a commit in the containing repo/branch. > > I'm not sure what you mean by this. I am trying to understand how submodules work in git. So I have a subdir "bar". The tree referenced by the current commit (of the supermodule) has an entry for "bar" and its content object is not a file but another commit. So suppose I pull a different commit inside the submodule. Would that not mean that the supermodule needs to change, too, i.e. refer to this new commit instead of the old one? But that cannot be, since the commit of the supermodule is immutable.... ahh, I think I do understand: git will show me this update as an uncommitted change! I can commit it in the supermodule and then it "officially" refers to this new commit of the submodule. Correct? > > So the best it can be is the nominal version of the submodule, as > > specified in the .gitmodules file, right? > > Not quite. First, the .gitmodules file does *not* specify a version, > except implicitly for the first checkout. After that, it will be > specified by the commit representing that submodule in the tree object > representing the parent directory of the submodule. Consider an > example, below. > > We have app, the toplevel directory for our project, app/src, a > directory containing the code for our main program, and app/lib, a > submodule (directory) containing a library developed by another > project. The contents of the file implementing a commit will look > something like this (the comments following # are not part of the > object): > > commit-header-magic > parent: <SHA1> # refers to a commit object > date: Tue Apr 10 01:58:00 2018 > tree: <SHA1> # refers to a tree object > > The tree object will look something like: > > tree-header-magic > README: <SHA1> # refers to a blob object > Makefile: <SHA1> # refers to a blob object > src: <SHA1> # refers to a tree object > lib: <SHA1> # refers to a commit object > > Now, the SHA1 for lib is initialized to the HEAD of the repo named in > .gitmodules. After that, if the repo in lib is changed, either by > pulling new commts from lib's upstream or by a local commit, nothing > happens immediately, but you can use "git submodule update > <submodule>" to update that tree object (in the index). Normally, > this just checks out the commit recorded in the tree object displayed > above, but it can also be configured to merge any local commit or > rebase the local commits in the submodule on the commit in the tree > object. Then if you commit the main project, the tree object > representing the project in the object database will be updated to > reflect the HEAD commit in the submodule. Thanks, I think this make more sense now. > > In a Unix file system, the inode represents file identity. It does not > > change when the file is mutated. This must be different in git, then, > > since a hash can only refer to a specific version of the file. Does each > > blob object contain a reference to its previous version(s), or is > > tracking identity of files done only at the commit level? > > Tracking identity of files in the sense of UUID as you describe is not > done at all. What happens is that tree objects associate file names > with "blobs" of content. If this pair changes (a name disappears, a > new name appears, or the blob associated with a name changes) git will > check to see if the relevant blob, or one whose diff is only a "small" > fraction of the filesize, appears elsewhere in the project. If so, > git interprets that as a rename or a copy of the file. But it's not > hard to imagine cases where files with independent origin get the same > content (eg, empty files such as are used to ensure that directories > are recorded in git). Understood. > > But there are now also ["ghost"] objects that are not manifest, > > Do they have content, or are they empty, waiting to be filled and > plopped down somewhere on the file system? They do have content, else you could not track changes to them. You can manifest such a file (or directory i.e. tree) anywhere you chose and then it will "pop up" at the specified location (directory+name), out of nothing, including all the content :) I am currently working on the details of how this is supposed to work out in practice. There are some open questions with regard to the design but I don't think this is the right place to discuss them. >  There's also a culture of "commit early and often" (and edit your > commits), which keeps the working tree "close" to the local repo. The > Mercurial and Bazaar communities have a "commit only complete, > coherent changesets" bias, and when it gets extreme the working tree > can get scarily divergent from the local repo. In Darcs the "commit early and often and edit your commits" is also a pretty common mode of operation, though I know of people who are uncomfortable with editing patches and if in doubt rather unrecord and then re-record everything. One colleague of mine also likes to work in this way, but prefers to use Mercurial and uses the mq extension for everything until all intermediate versions are clean, tested, and ready for the mainline. Cheers Ben _______________________________________________ darcs-users mailing list email@example.com https://lists.osuosl.org/mailman/listinfo/darcs-users