Hi, So as a follow-up to last night's history editing debacle, I thought I'd send a couple of pointers on revision control. I've seen some rather bad things happening in the past, but I have let them slide until now. This is aimed mostly at Scott and Prageeth who haven't had a lot of experience with revision control (for this reason, I don't blame you guys, but it still needs improving). Now I have somewhat of a reputation as a "revision control nazi" ... I don't expect things to be perfect, but some of these commits are way out of hand. This is going to be a bit of a yell, but hopefully you'll get some important advice which should help you in all of your future projects.
There are several practical reasons to make shorter, more "atomic" commits. A key one here is that part of my job is to watch the commits and make sure the project is going OK. I tend to read all of the diffs, one commit at a time, but I can't understand a 1200 line diff. In general, if your diff (bzr diff | wc -l) is more than 1000 lines long, something is wrong. Another important benefit to committing small diffs is that they are much less likely to create a merge conflict if someone else is working on the same file (I was working on an open source project recently, and I had publicly stated that I had been working on a file for a month when another developer rudely reformatted all the tabs to spaces, causing me to have an entire-file merge conflict which took hours to resolve). *What went wrong* Basically, the commits from both of you are generally far too big. There's nothing fundamentally wrong with big commits, but they should address just a single issue. For example, my commit 39<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/39>added a lot of files, but they were all added at the same time because they were all part of the GWT template. Similarly, my commit 29<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/29>changes fourteen files, but it isn't out of hand, because it did exactly the same thing to each file (add a package import), and therefore, all of those changes belong in the same diff. By contrast (and I'm only picking on Prageeth because he's done a lot of "heavy lifting"), commit 33<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/33>is a 1265 line diff, with four independent changes mentioned in the log. Looking at that diff, it does all of the following: - Deletes and adds new imports to various files. - In Achievement.java, it deletes a huge number of lines, and adds them all back again with slightly different indentation. At the same time, it adds various new methods. - Adds tons of new methods to a wide range of classes. - Deletes some methods. - Adds new heading comments, tidying up the source code a bit. - In Game.java, a bunch of blank lines are deleted, more are added, and some blank lines just change the number of spaces/tabs. Notice the number of red/green lines in the Launchpad summary. OK so *most* of those things are helpful changes to the source code. But none of those activities are in any way related, and therefore they should not appear in the same commit. The changes which helped to tidy up the code (moving things around, adding comments, deleting unused imports, etc) are helpful, but nowhere near as important as adding new features. Therefore, add new features in one commit, tidy up in another. Then if I see a commit log "added new features" I will read it. If I see another commit "tidy up source code" I will ignore it. Do not feel embarrassed to make "a whole commit" just to clean up some comments -- it is much better to do trivial tasks in their own commit rather than include them with a bunch of other changes. In Achievement.java, we have obviously had tabs converted to spaces or vice versa. The entire file shows as a huge delete and a huge add. Therefore, the diff is *completely useless*. I can't possibly tell what you have actually changed in that file -- I can see that setGame is not in the red, only the green, so I guess you added that method. But I can't tell that you added that method from the diff, because the diff shows that you added *everything *. This is a small-scale example of what happened last night (you deleted and added a dozen files, *as well* as making around 500 lines of changes to those files -- I couldn't possibly tell what those 500 line changes were since all bzr diff showed was a dozen new files being added). So firstly, do you need to reformat all the tabs into spaces? That isn't something you should do lightly (as it means you take credit for all the lines of code and we can't see who really wrote them using bzr blame), but if it is, it is absolutely something which should take place in a separate commit, "changed tabs into spaces for consistency." You should *never* reformat code as well as making actual meaningful changes. Note: I am aware that Eclipse probably did this without you directly doing it, but I'll address that later. As for Game.java, there are just tons of lines being added and deleted for no reason whatsoever. Not as bad as Achievement, but they mess up the diff and make it very hard to read. These aren't *intentional* changes I assume, so why are they in the diff? *How to make better commits* So here is the bottom line: revision control is not about "it's time to make a backup" or "I'd better share my work with everyone else." The most fundamental aspect of revision control is that *you are absolutely in control of the changes you make*. The most important feature of any revision control system is "bzr diff". It tells you exactly what changes you are about to permanently make to the project. So Eclipse might be responsible for reformatting a bunch of lines of code without your permission, but *you*are the one responsible for making them permanent by committing them. "bzr diff" shows you exactly what changes you/Eclipse has made, "bzr commit" confirms that you approve of them. You should always run bzr diff immediately before bzr commit. If there is a single line of the diff which you did not intend to make or you cannot explain, then you should not be committing. If you see a random blank line has been added and there is no reason for it to be there, go back and delete it, so it doesn't pollute the diff. If a block of code has changed its indentation, go back and fix it so it is indented *exactly* the way it was before, so that the diff shows only the changes that are important. If you really want to fix up the indentation, fix it in a separate commit. Working like this gives you a huge benefit: you can be (relatively) sure you are not accidentally breaking some existing code. For example, if you have put a random debug print somewhere in some existing code, it will stick out like a sore thumb in the diff, and you can delete it before you commit. However, if you have a random debug print in a file that has had its indentation reformatted, then you won't notice it at all, and it may end up in the final product. Prageeth, I don't blame you for not knowing how to use bzr mv, but if you had checked the diff before committing r45, you would have never been able to explain all of those 3000 lines, and therefore should have known it was a bad idea. Scott, I don't blame you for not knowing how .bzrignore works, but if you had checked the diff before committing r46, you would have known that you were about to commit those lines which you thought were private to yourself. Remember, you are not committing your directory -- you are committing the diff, and that is what everybody else will see. It is very important to not commit bug fixes alongside any other changes. Each bug fix should have its own individual commit, so that we can test, yes, before this commit it was exhibiting a bug; after the commit it is not (and nothing else changed or broke). It's also not good to commit code that doesn't compile, as some of the commits I fixed last night did not. Please use 'ant javac' (since ant is very slow, 'ant javac' at least compiles the server-side stuff) before committing. You'll note that when I applied Prageeth's changes last night, I did them in a very atomic way. Revision 45<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/45>is now *solely* for moving files. If you ever move files, it is a good idea to do nothing else in that commit other than the move. Note that you can bzr mv a directory rather than a lot of individual files -- that makes the log look much cleaner. Note that in r45, I still modified a *huge* number of files, but I didn't do anything other than fix up the package imports, which is necessary for r45 to compile successfully. I committed all of the remaining changes in revision 46<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/46>, so it is now possible to see exactly what Prageeth changed in those files which were previously moved. I also noticed in the changes he emailed me that Role was moving out of UserGroups, and that was causing quite a few package imports to change, so I decided to separate that out from the remaining changes and commit it on its own<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/50>. I note that the subsequent 4 commits by Prageeth were perfectly reasonable, so good job. *Tools* Finally, some tools for working with diffs. Now there are often times when you are working on a big change and you find that you need to make a small fix and commit that as a tiny commit. "bzr shelve" is incredibly useful for this; it's how I managed to take Prageeth's huge commit and separate out different parts. You just type "bzr shelve" and then it will go through the whole diff and ask you "Shelve this change?" You just hit y or n (yes or no) for each piece of the diff. For everything you hit 'y' on, it will temporarily undo that change, reverting it back to the way it was before you changed it, but it will store the change in the "shelf". So you should hit 'y' on everything you *do not want to commit*, and 'n' for everything you do want to commit. Then you can commit just the changes you need. Afterwards, run "bzr unshelve" to bring all the shelved changes back. "bzr shelve --list" shows everything you have shelved. Importantly, you need to reload the files in your text editors after using shelve or unshelve, or they will be out of date and chaos will ensue. Also you might want to start looking at branches. Branches are a fairly advanced concept, but they are useful when you are working on something that is simply too big to commit all in one go, but you can't commit in pieces or it will break things. (Generally, new features.) Prageeth, you mentioned that you have a large upcoming database change that won't be ready for a few days. This scares me, as it implies many thousands of lines will change. It also implies you will go days without committing, which means lots of uncommitted changes that have a habit of going missing. What you might want is a branch. A branch is just like a copy of the repository, where you are free to break things. In 'trunk' you should never commit code that doesn't work. But in a branch, you can commit half-implemented code or even code that doesn't compile, and nobody will yell at you. When you are finished with the branch, merge it back into trunk. It will show up as a single huge commit, but it will have a lot of sub-commits showing the individual steps, so it doesn't break the above guidelines. To do this, you need to create a new directory for the branch (which is why I recommend you have a directory "mugle" with a subdirectory "trunk" with your checkout, then you can put other branches in the "mugle" directory). So for example, come up with a name for the branch, such as "database-wrapper" and go to the mugle directory: mugle$ bzr branch trunk my-branch # (my-branch is the name of the branch) mugle$ cd my-branch mugle/my-branch$ bzr push lp:~mugle-dev/mugle/my-branch # (my-branch is the name of the branch) mugle/my-branch$ bzr bind Now you can work in my-branch. Commits will still be public on Launchpad but they will not interfere with the main trunk. When you are finished implementing the branch: mugle$ cd trunk mugle/trunk$ bzr merge ../my-branch Now this merges the branch with the trunk and you should deal with any conflicts and make sure it still compiles. Then simply bzr commit to finish off the branch -- now the branch changes are integrated with trunk. I can demonstrate either of these tools in person if you would like to use them. One final point, just for house-keeping. If the commit you are making fixes a bug, please add to the "bzr commit" line --fixes=lp:BUG-ID. For example, in revision 39, I added --fixes=lp:722957. This helps link the bugs to the commits so we can see where they were closed off. If you forget, it isn't a big deal, but it's helpful to do that. Thanks for reading. As I said, I don't expect you to follow all of the above guidelines perfectly, and don't let it interfere with your work too much. But, I don't want to have to read any more 1000-line+ diffs. Matt
-- Mailing list: https://launchpad.net/~mugle-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~mugle-dev More help : https://help.launchpad.net/ListHelp

