Re: [racket-dev] proposal for moving to packages: repository
On Friday, Matthew Flatt wrote: At Fri, 24 May 2013 12:44:35 -0400, Eli Barzilay wrote: * The script should also take care to deal with files that got removed in the past. Ditto. I don't believe that it's *not* doing this, so I did the double-check in the form of a test. You're right --- I misunderstood your example. (BTW, in case it wasn't obvious -- that was a typo, since the script is not doing it...) Still, I'm happy enough with the result in your example. The conversion does preserve `git log --follow' results for the files that survive, which was my intended spec. And maybe it's better to explain my interest as `git blame', since my main interest in the history of a file is often how/why a particular bit of code ended up as it is. Ah, yes -- in that case, I think that it's doing that (= maintaining the blame information) fine, but there are still things that you'll want to keep. (At least for some value of you...) (I'll get back to this later, since it's the main content.) * filter-branch one time using your script to reorganize the files according to packages * use filter-branch with a subdirectory filter 5 times to create each repository Total runtime: about 21 hours This latter use would end up with the final tree being exactly the same (since you're talking about doing the reorganization within git), but the history would be different since it's as if the files were there the whole time. I don't see how that works. Since my script leaves each file in its original location for old commits, I expect a subdirectory `filter-branch' to still drop history for the old locations. In any case, I'm happy to sort out that detail later. Ah yes, keeping the files in-place instead of shuffling them around is definitely much better. And yes, it means that it *will* take that large chunk of time for each extracted repository, but I think that it's definitely worth the effort. (Once there is a good way to do the whole trimming thing, I can easily script that onto a bunch of lab machines to do it all in parallel.) If we agree that `git mv' before splitting is practical, though, that's all I need for now. Yes -- with all of the above, and with the additional improvements that I'll suggest below. Actually, I'll just send that in a new email since it's long enough. From my perspective, the important thing is to have the ability to just edit and move files around to sort out packages, instead of having the indirection of a script that edits and moves files around. OK -- but I still think that it's worth it to save a second major change for people, and given that you've started with a suggestion for package splitting, maybe just go on with revising that for a short time and then just do the splitting without an intermediate period? For people who want to keep dealing with the whole tree, the layout is going to be the same so there won't be much difference anyway, and people who want to deal with just their corner will get more time to adjust and enjoy the benefits of dealing with just their corner quicker. BTW, it will potentially lead to more problems where my change to my own repo goes fine and I don't know that it got broke because of a change elsewhere since I didn't keep the other files in git form -- but this makes me think that the next release might be prone to such issues, so it's better to start earlier with the segregation rather than doing it later. (But OTOH, the builds and drdr will keep a high level of problem prevnetion, I hope.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
Now for the problems that are likely worth paying attention to, and suggestions for improving things... The quick summary of what I'm going to say is that I think that there's a significant improvement that can be done with some more work, one that requires some minimal manual intervention. Because of this, I think that it's best to work with a whole repository database of file movements, which will be made automatically, but revise-able manually to fix things. Your scripts will change to parse this file instead of running git directly, but since the format will be uniform, this should be easy to adjust. And a point of clarification: as you noted, these problems are not things that you'll see in blames now. For example, cases of misidentification are in many places obviously nonsense, and real cases are rare. Another example is if there's a commit that removed a bunch of code that you want to go over: currently, you'll see the commit that removed a file in your history and the removed file is visible in that commit but it won't be if it's truncated away. I'll repeat here that I'm personally fine with not doing any of this, but I think that most people do care about losing these bits. Also, note that some of these problems are likely to go away in some future git (for example, search for fractions in the below problems to see a feature that git doesn't have now but might improve in the future), so an improved future blame will actually produce better output when things are fixed manually even though currently the result won't differ as much with these fixes. A good starting point for the whole-repo database of file movements is: git log --date-order --format='%n%h %ai %s' \ --name-status -M -C --find-copies-harder -l2 -B For reference, I've put this output here: http://tmp.barzilay.org/git-log.txt I'm thinking of starting with this text, and manually fixing things like removing bogus copies/moves, and adding ones that git missed. In addition, there should be some enrichment to the format, to specify where deleted files go -- so it's possibl to go over removed files (everything that starts with D) and assign them to package repos. (Many of them are easy to do since their destination package is obvious.) The following is a list of problem examples, which can be addressed as above. Here is a problem where some potentially useful history is lost: 2a94ca9 Eric Dobson (3 weeks ago) Cleanup tc-lambda-unit. M collects/typed-racket/typecheck/tc-lambda-unit.rkt D collects/typed-racket/typecheck/parse-cl.rkt c25ed74 Stephen Bloch (7 weeks ago) Moved error-message tests into a module+ in main source file. M collects/picturing-programs/private/tiles.rkt D collects/picturing-programs/tests/tiles-error-tests.rkt 1838953 Vincent St-Amour (5 months ago) Move define-inline to racket/performance-hint. M collects/scribblings/reference/syntax.scrbl D collects/unstable/scribblings/inline.scrbl = In these cases, the second file got cleaned up into the first, but git considers them unrelated by default, so the history of the first is lost if it is not kept explicitly. 9f337c6 Jay McCarthy (10 weeks ago) Removing the planet2 name from the code A collects/tests/pkg/tests-checksums.rkt A collects/tests/pkg/tests-conflicts.rkt A collects/tests/pkg/tests-deps.rkt D collects/tests/planet2/tests-checksums.rkt D collects/tests/planet2/tests-conflicts.rkt D collects/tests/planet2/tests-deps.rkt ... lots of these ... = In these cases files got renamed with enough changes to a point where git misses the fact that they were renamed. (BTW, for this reason I recommended that renames are done without other modifications, and instead do them in a separate commit.) It might help the above to lower the similarity threshold, but the first problem is that git measures changes in relation to the overall file size, so if the second file is big enough, it will not help. Also, there are these problems: 198a65a Matthew Flatt (13 days ago) raco pkg create: support source and binary bundling C100 collects/launcher/shcollects/tests/pkg/test-pkgs/pkg-x/nobin-top.txt ... 6c1e163 Matthew Flatt (1 year, 2 months ago) add missing jfp.css C100 collects/launcher/shcollects/scribble/jfp/jfp.css = Empty files are an obvious problem here, since they are 100% similar, and therefore considered a copy of some random empty file. Cannot just ignore empty files, since it happens in other files too: fae660b Jay McCarthy (7 months ago) Release Planet 2 (beta) C056 collects/meta/drdr2/analyzer/analyzer.rkt collects/tests/planet2/test-pkgs/planet2-test1-conflict/planet2-test1/conflict.rkt ... many more ... b2b5875 Blake Johnson (2 years, 7 months ago) replacing self modidx refs and tests C092 collects/meta/drdr2/analyzer/analyzer.rkt collects/tests/compiler/demodularizer/tests/racket-5.rkt =
Re: [racket-dev] proposal for moving to packages: repository
Yesterday, Robby Findler wrote: Hi Eli: I'm trying to understand your point. Do I have this right? Background: The git history consists of a series checkpoints in time of the entire repository, not a collection of individual files. Yes, although the difference between entire repository and individual files is mostly theoretical. The main point is that the log history is made from changes to *content* -- you can't have some made up history planter artificially for a file. (And it is the same for most CMSs if not all; the main difference here is that git doesn't keep meta information about copying and renaming.) So, when I do git log x.rkt then what I get is essentially a filtered list (except where people didn't properly rebase, but lets ignore that) of those checkpoints: all the ones where x.rkt changed. Exactly. (I don't get the rebase comment though -- even without rebasing what you get is this filtered history.) Big Question: The issue is, then, when we split up the current repo into smaller repos, what are the series of checkpoints that we're going to make up for the individual repos? Right? Yes, but it can get a bit subtle. Like I said, the `filter-branch' tool is basically replaying the entire history, giving you points to inject hooks that can modify the tree or the commits, etc. Note that in all uses that were mentioned, there was a --prune-empty flag, which means that commits that didn't have any change are dropped. I'm mentioning this because some people might have an illusion that it's better to *not* do that and keep these commits. Here's an example why this is not useful: say that you have this edit sequence: foo@somewhere creates A/x, with log message created x bar@somewhere edits it, with log message edited x baz@somewhere moved it into B/x, with log renamed A to B If at this point you use any git tools, they can see the real history. For exmaple, you can use `blame' to see which lines were written by which user. Also, assuming that these are all the changes, a git log will show the three commits as they appear above. Now, if you you use filter-branch to modify the repository and keep only the B directory, but you *don't* use the --prune-empty flag: the fact that you want to keep these other commits won't help -- the full history would have the three commits with the same three messages, but doing a log for just the file would show only the commits for the file, so the first two commits won't be shown. Similarly, blame can't show anything useful -- you'll only see baz@somewhere as the author of the entire file. And the reason this makes sense is that the full commit history has the first two commits, but they had no change -- so there's nothing that ties them to the file in the trimmed repository, let alone something that relates them to specific lines in the file... (Two notes: (a) This is just a demonstration -- obviously, this is a trimming that is done in a bad way since it dropped A even though it's part of the history of B. (B) Actually, it looks like the --subdirectory-filter drops empty commits anyway, but the above explains why it makes sense to do that.) Your Advice: And, IIUC, you're suggesting that the best way to deal with this question is to defer it until we are more sure of the actual split we want to make. So we don't mess with the history at all The point is that every such messing-with-history should be done very carefuly and checked thoroughly, since the chance to mess things up is very real. In the above, it's obvious that I should have not droped A in the filter -- but if it's some random single file which you had in the framework collection, out of tons of other files in the drracket package, then it's unlikely that I will catch it -- which is why I prefer using tools for these things and resolve all such issues with the people who know about the code. and instead just work at the level of some script that we can run to just use mv and company to move things around. When we know exactly what ends up going where, then we can figure out how to make up a new, useful history for the separate repositories. Is that the point? The thing is that having two such filters (one to restructure the big repository and one to split it) is both increasing chances for making mistakes, and making the job of the second restructure much harder to do. To the point where doing it manually is infeasible, which is why I said that it will guarantee losing history. (And I'll reply to Matthew's suggested tool next.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
8 hours ago, Matthew Flatt wrote: At Thu, 23 May 2013 07:09:17 -0400, Eli Barzilay wrote: Relevant history is vague. The history I want corresponds to `git log --follow' on each of the files that end up in a repository. (In this context this is clear; the problem in Carl's post is that it seemed like he was suggesting keeping the whole repository and doing the split by removing material from clones -- which is and even fuller history, but one that has large parts that are irrelevant.) That's true if you use `git filter-branch' in a particular way. I'll suggest an alternative way, which involves filtering the set of files in a commit-specific way. That is, the right set of files to keep for each commit are not the ones in the final place, but the ones whose history we need at each commit. If that can be done reliabely, then of course it makes it possible to do the split reliabley after the first restructure. It does come with a set of issues though... [... scripts description ...] Here are a bunch of things that I thought about as I went over this. In no particular order, probably not exhaustive, and possibly repetitive: * Minor: better to use `find-executable-path' since it's common to find systems (like mine) with an antique git in /usr/bin and a modern one elsewhere. (In my case, both scripts failed since /usr/bin has an antique version.) * There is an important point of fragility here: you're relying on git to be able to find all of the relevant file movements (renames and copies), which might not always be correct. On one hand, you don't want to miss these operations, and on the other you don't want to have a low-enough threshold to identify bogus copies and renames. * Because of this, I think that it's really best to inspect the results manually. The danger of bogus copies, for example, is real, especially with small and very boilerplate-ish files like info.rkt files. If there's a mistaken identification of such a copy you can end up with a bogus directory kept in the trimmed repo. In addition, consider this information that the script detects via git for a specific commit: A/f1.ss renamed to B/f1.rkt A/f2.ss renamed to B/f2.rkt ... A/f47.ss renamed to B/f47.rkt A/f48.ss renamed to B/f48.rkt A/f49.ss deleted A/f50.ss deleted B/f49.rkt created B/f49.rkt created For a human reviewer, it's pretty clear that this is just a misidentification of two more moves (likely to happen with the kind of restructures that we did in the past, where a single commit both moves a file, and changes its contents). This is why on one hand I *really* like to use such scripts (to make sure that I don't miss such things), but OTOH I want to review the analysis results to see potential problems and either fix them manually or figure out a way to improve the analysis and run it again. * Also, I'd worry about file movements on top of paths that existed under a different final path at some point, and exactly situations like you described, where a file was left behind, but that file is completely new and should be considered separate (as in the case of a file move and a stub created in its place). * The script should also take care to deal with files that got removed in the past. For example, the drscheme collection has some file which gets removed, and later (completely unrelated) most of the contents migrated to drracket. If the result of the analysis is that most of the material moved this way, and because of that you decide to keep the old drscheme collection -- you'd also want to keep that file that disappeared before the move, since it's part of the relevant history. So I'd modify this script to run on the *complete* repository -- the whole tree and all commits -- and generate information about movements. Possibly do what your script is does for the whole tree, then add a second step that runs and looks for such files that are unaccounted for in the results, and decide what to do with them. I think that this also means that it makes sense to create a global database of all file movements in a single scan, instead of running it for each package. * Technical: I thought that it might make sense to use a racket server (with netcat for the actual command), or have it compile a /bin/sh script to do the actual work instead of using `racket/kernel' for speed. However, when I tried it on the plt tree, it started with spitting out new commits rapidly, but eventually slowed down to more than a second between commits, so probably even the kernel trick is not helping much... * Actually, given the huge amount of time it's running (see next bullet), it's probably best to make it do the movements from all paths at the same time. In this specific context, this means that it scans the package-restructured repo (from the first step) into a package-restructured repo
Re: [racket-dev] proposal for moving to packages: binary vs source
[Note subject change...] Two days ago, Eric Dobson wrote: For binary vs source, I think you are providing a good argument for the usefulness of a no source distribution. Some people want to use tools written in Racket, and the fact that the tools are written in Racket is immaterial to them. They should be able to have just the binary versions. There have been a bunch of concerns expressed about the question of distributing sources or not -- but I think that generally speaking, there shouldn't be any problems at all. Here's a list of things that contribute to not having such concerns: 1. The eventual goal would be to have very easy selection of packages that you want to install. Either with (a) a bunch of installers, (b) possibly doing this by just a different URL that will have the installers listed in it as arguments, or (c) with a post-install dialog that will ask you for additional packages to install. (In the (c) case, it could also detect packages that you had decided to install previously, and re-use the same list.) The bottom line is that if *you* want to get the sources, then it should be extremely easy to just have them installed (c), or create installers that include the sources (a;b) which you'll use. The main point here is that using packages will make such variations very easy to implement, and make it easy for you to add sources or provide popular options based on demand. 2. With the geiser/drracket concern about reduced functionality because there are no sources: the information about the source of bindings is still there. (Ie, things work fine if you remove a random source file from a current installation -- the only difference is that the actual source file is not there.) Now, I'm assuming that there is some way with the package system to know for any given file which package it came from. With this information, I think that it would be easy to do something like this: * In drr, if you try to jump to a definition for a function whose source is not included, you get a popup telling you that you don't have the source, and list an on-line URL where the source can be found (which is inferrable from the package information) as well as a one-button-click option to install the source and then open the file. * Geiser could do exactly the same, and also use something like `url-handler-mode' to visit the source file directly from the on-line source in addition to offering to install the sources. 3. I think that there should be an option for package owners to decide how their package gets installed, so for example, if realm must be distributed with its sources, it can just specify that and avoid the stripping that other packages would go through. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
At Fri, 24 May 2013 03:26:45 -0400, Eli Barzilay wrote: If that can be done reliabely, then of course it makes it possible to do the split reliabley after the first restructure. Great! Let's do that, because I remain convinced that it's going to be a lot easier. * Also, I'd worry about file movements on top of paths that existed under a different final path at some point I believe the file-lifetime computation in slice.rkt takes care of that. * The script should also take care to deal with files that got removed in the past. Ditto. * Actually, given the huge amount of time it's running (see next bullet), it's probably best to make it do the movements from all paths at the same time. There's no need to move anything while extracting a repository slice; the movements happen before. * It's not clear to me what you want to do at this point, [...] Alternatively, do the first restructure with in-repo moves instead, Yes, that's what I suggested. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
Four hours ago, Matthew Flatt wrote: At Fri, 24 May 2013 03:26:45 -0400, Eli Barzilay wrote: If that can be done reliabely, then of course it makes it possible to do the split reliabley after the first restructure. Great! Let's do that, because I remain convinced that it's going to be a lot easier. I'm really surprised. Given that you consider this a *lot* easier, and that I consider it (reorganization + split) a lot messier, I think that I'm still not getting something. * Also, I'd worry about file movements on top of paths that existed under a different final path at some point I believe the file-lifetime computation in slice.rkt takes care of that. That's what it looks like, but I'd double-check to make sure that it happens. * The script should also take care to deal with files that got removed in the past. Ditto. I don't believe that it's *not* doing this, so I did the double-check in the form of a test. When I run it, I see these bad things (which I expected to happen, so wrote it as a test): * The c file got completely lost (this is the pre-reorganization file deletion scenario) * The b file got lost too (post-reorg deletion) * The history of e during the A days got lost, since it was not recognized as a rename in the A-B move due to being edited too. = The first two are things that a script can deal with doing some kind of scan like I mentioned (go over the full history of the full tree). = The third one is something that requires human judgment *but* if the A/e historic file is considered as deleted, and if deleted files from the original directories are included with doing the above, then it should still be there in the rewritten repo. Test file attached; probably need to do very little other than adjusting the paths to the two racket scripts. b Description: Binary data * Actually, given the huge amount of time it's running (see next bullet), it's probably best to make it do the movements from all paths at the same time. There's no need to move anything while extracting a repository slice; the movements happen before. What I'm saying is that if filter-branch using your script takes 20 hours, and you want to use it to split the repo to 5 packages, and if a simple filter-branch with a subdirectory filter takes a few minutes, then instead of: * filter-branch using your script 5 times to create each repository Total runtime: more than 4 days you do this: * filter-branch one time using your script to reorganize the files according to packages * use filter-branch with a subdirectory filter 5 times to create each repository Total runtime: about 21 hours This latter use would end up with the final tree being exactly the same (since you're talking about doing the reorganization within git), but the history would be different since it's as if the files were there the whole time. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
At Fri, 24 May 2013 12:44:35 -0400, Eli Barzilay wrote: * The script should also take care to deal with files that got removed in the past. Ditto. I don't believe that it's *not* doing this, so I did the double-check in the form of a test. You're right --- I misunderstood your example. Still, I'm happy enough with the result in your example. The conversion does preserve `git log --follow' results for the files that survive, which was my intended spec. And maybe it's better to explain my interest as `git blame', since my main interest in the history of a file is often how/why a particular bit of code ended up as it is. What I'm saying is that if filter-branch using your script takes 20 hours Just to confirm, my experiment on the main repo completed in right at 20 hours. (The `git log --follow's and `git blame's that I tried look good to me.) * filter-branch one time using your script to reorganize the files according to packages * use filter-branch with a subdirectory filter 5 times to create each repository Total runtime: about 21 hours This latter use would end up with the final tree being exactly the same (since you're talking about doing the reorganization within git), but the history would be different since it's as if the files were there the whole time. I don't see how that works. Since my script leaves each file in its original location for old commits, I expect a subdirectory `filter-branch' to still drop history for the old locations. In any case, I'm happy to sort out that detail later. If we agree that `git mv' before splitting is practical, though, that's all I need for now. From my perspective, the important thing is to have the ability to just edit and move files around to sort out packages, instead of having the indirection of a script that edits and moves files around. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
9 hours ago, Matthew Flatt wrote: At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote: That's true, but the downside of changing the structure and having files and directories move post structure change will completely destroy the relevant edit history of the files, since it will not be carried over to the repos once it's split. It's possible that we're talking past each other due to me not getting this point. (Obligatory re-disclaimer: I consider the problem with forcing people to change their working environment much more severe.) Why is it not possible to carry over history? The history I want corresponds to `git log --follow' on each of the files that end up in a repository. I'm pretty sure that such a history of commits can be generated for any given set of files, even if no ready-made tool exists already (i.e., 'git' is plenty flexible that I can script it myself). Or maybe I'm missing some larger reason? The thing to remember is just how simple git is... There's no magical way to carry over a history artificially -- it's whatever is in the commits. To make this more concrete (and more verbose), in this context the point is that git filter-branch is a simple tool that basically replays the complete history, allowing you to plant various hooks to change the directory structure, commit messages or whatever. The new history is whatever new commits are in the revised repository, with no way to make up a history with anything else. Now, to make my first point about the potential loss of history that is inherent in the process -- say that you want to split out a drracket repo in a naive way: taking just that one directory. Since it's done naively, the resulting repository will not have the drscheme directory and its contents, which means that you lose all history of files that happened there. To try that (in a fresh clone, of course) -- first, look at the history of a random file in it: F=collects/drracket/private/app.rkt git log --format='%n%h %s' --name-only --follow -- $F Now do the revision: S=collects/drracket git filter-branch --prune-empty --subdirectory-filter $S -- --all And look at the same log line again, the history is gone: git log --format='%n%h %s' --name-only --follow -- $F If you look at the *new* file, you do see the history, but the revisions made in drscheme are gone: git log --format='%n%h %s' --name-only --follow -- private/app.rkt In any case, this danger is there no matter what, especially in our case since code has been moving around in the racket switch. I *hope* that most of it will be simple: like carrying along the drscheme directory with drracket, the scheme and mzlib with racket, etc. Later on, if these things move to compat packages, the irrelevant directories get removed from the repo without surgeries, so the history will still be there. This shows some of the tricks that might be involved in the current switch: if you'd want to have some compat package *now*, the right thing to do would be: * do a simple filter-branch to extract drscheme (and other such collections) in a new repository for compat * for drracket: do a filter-branch that keeps *both* directories in, then commit a removal of drscheme. (Optionally, use rebase to move the deletion backward...) Going back to the repo structure change that you want and the reason that I said that doing moves between the package directories post-restructure is destructive should be clear now: say that you move collects/A/x into foo/A/x as part of the restructure. Later you realize that A/x should go into the bar package instead so you just move it to bar/A/x. The history is now in, including the rename, but later on when bar is split into a separate repo, the history of the file is gone. Instead, it appears in the foo repository, ending up being deleted. One way to get around this is to avoid moving the file -- instead, do another filter-branch surgery. This will be a mess since each such change will mean rebuilding the repository with all the pain that this implies. Another way to get around it is to keep track of these moving commits, and when the time comes to split into package repos, you first do another surgery on the whole repo which moves foo/A/x to bar/A/x for all of the commits before the move (not after, since that could lead to other problems), and then do the split. This might work, but besides being very error-prone, it means doing the same kind of file-movement tracking that I'm talking about anyway. So take this all as saying that the movement of files between packages needs to be tracked anyway -- but with my suggestion the movement is delayed until it's known to be final before the repo split, which makes it more robust overall. But really, the much more tempting aspect for me is that this can be done now -- if you give me a list of packages and files, I can already do the movement script. Actually, in an
Re: [racket-dev] proposal for moving to packages: repository
9 hours ago, Carl Eastlund wrote: I was going to comment on the same thing. While a naive use of git filter-branch might not retain the history, it should be entirely possible to do something a little more intelligent and keep that history. Just to be clear, this is exactly what you can't get with filter-branch. Essentially each of the new repositories could keep the entire history of the original repository, followed by a massive move/rename, then moving forward with an individual package. This can work, but it is unrelated to filter-branch: it's basically starting each package repository from a clone of the monolithic repo, then move shuffle things around. This seems wrong to me in all kinds of ways -- but if someone wants to do this with *their* package (ie, not a package that I need to deal with), then it's certainly an option. (That's one of the big appeals of moving to packages for me: some code moves to packages which I can let myself Not Care About™. Knock youself out with tabs, spaces at ends of lines, braces in code, two spaces between bindings and values in `let's, and make sure that no file ends with a newline...) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
On Thu, May 23, 2013 at 5:49 AM, Eli Barzilay e...@barzilay.org wrote: 9 hours ago, Carl Eastlund wrote: I was going to comment on the same thing. While a naive use of git filter-branch might not retain the history, it should be entirely possible to do something a little more intelligent and keep that history. Just to be clear, this is exactly what you can't get with filter-branch. Essentially each of the new repositories could keep the entire history of the original repository, followed by a massive move/rename, then moving forward with an individual package. This can work, but it is unrelated to filter-branch: it's basically starting each package repository from a clone of the monolithic repo, then move shuffle things around. This seems wrong to me in all kinds of ways -- but if someone wants to do this with *their* package (ie, not a package that I need to deal with), then it's certainly an option. It doesn't seem wrong to me. It's an accurate representation of the history of the project, which is exactly what git is for retaining. Where does the problem come from? If git filter-branch doesn't maintain the history we need, it's not the right tool for the job. --Carl _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
A few minutes ago, Carl Eastlund wrote: On Thu, May 23, 2013 at 5:49 AM, Eli Barzilay e...@barzilay.org wrote: 9 hours ago, Carl Eastlund wrote: I was going to comment on the same thing. While a naive use of git filter-branch might not retain the history, it should be entirely possible to do something a little more intelligent and keep that history. Just to be clear, this is exactly what you can't get with filter-branch. Essentially each of the new repositories could keep the entire history of the original repository, followed by a massive move/rename, then moving forward with an individual package. This can work, but it is unrelated to filter-branch: it's basically starting each package repository from a clone of the monolithic repo, then move shuffle things around. This seems wrong to me in all kinds of ways -- but if someone wants to do this with *their* package (ie, not a package that I need to deal with), then it's certainly an option. It doesn't seem wrong to me. It's an accurate representation of the history of the project, which is exactly what git is for retaining. Where does the problem come from? The problem of filter-branch? It has no problems, it does exactly what it is supposed to do. If git filter-branch doesn't maintain the history we need, it's not the right tool for the job. If the drracket files are irrelevant for the swindle package then they shouldn't be in the swindle repository -- and on the exact same token, the development history of drracket shouldn't be there either. (This is not new, BTW, I think that there was general concensus right from the start of the package talk that the monolithic repo is just a host for a bunch of separate projects.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
On Thu, May 23, 2013 at 6:57 AM, Eli Barzilay e...@barzilay.org wrote: A few minutes ago, Carl Eastlund wrote: On Thu, May 23, 2013 at 5:49 AM, Eli Barzilay e...@barzilay.org wrote: 9 hours ago, Carl Eastlund wrote: I was going to comment on the same thing. While a naive use of git filter-branch might not retain the history, it should be entirely possible to do something a little more intelligent and keep that history. Just to be clear, this is exactly what you can't get with filter-branch. Essentially each of the new repositories could keep the entire history of the original repository, followed by a massive move/rename, then moving forward with an individual package. This can work, but it is unrelated to filter-branch: it's basically starting each package repository from a clone of the monolithic repo, then move shuffle things around. This seems wrong to me in all kinds of ways -- but if someone wants to do this with *their* package (ie, not a package that I need to deal with), then it's certainly an option. It doesn't seem wrong to me. It's an accurate representation of the history of the project, which is exactly what git is for retaining. Where does the problem come from? The problem of filter-branch? It has no problems, it does exactly what it is supposed to do. It has no problems? Where above you stated this is exactly what you can't get with filter-branch in reference to keeping our packages' relevant history. That sounds like a problem to me, in our current context. But filter-branch is not what I was talking about. I was talking about _not_ using filter-branch, and instead doing something that does keep history. If git filter-branch doesn't maintain the history we need, it's not the right tool for the job. If the drracket files are irrelevant for the swindle package then they shouldn't be in the swindle repository -- and on the exact same token, the development history of drracket shouldn't be there either. (This is not new, BTW, I think that there was general concensus right from the start of the package talk that the monolithic repo is just a host for a bunch of separate projects.) Okay, then let's purge the history of irrelevant files, but keep the history of relevant files even if they weren't in the right directory. If the monolithic repo is just a host for a bunch of separate projects, shouldn't it be possible to tease out their more-or-less separate histories? --Carl _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
Just now, Carl Eastlund wrote: On Thu, May 23, 2013 at 6:57 AM, Eli Barzilay e...@barzilay.org wrote: A few minutes ago, Carl Eastlund wrote: It doesn't seem wrong to me. It's an accurate representation of the history of the project, which is exactly what git is for retaining. Where does the problem come from? The problem of filter-branch? It has no problems, it does exactly what it is supposed to do. It has no problems? Where above you stated this is exactly what you can't get with filter-branch in reference to keeping our packages' relevant history. Relevant history is vague. The thing that you can't do with filter-branch is keep the complete history if you remove files from the history -- the files that are gone go with their history. But filter-branch is not what I was talking about. I was talking about _not_ using filter-branch, and instead doing something that does keep history. Like I said: what you're suggesting means keeping the full monolithic history of developement in the main repo, including all of the irrelevant files (which will be removed in the tip, but included in the repo). If git filter-branch doesn't maintain the history we need, it's not the right tool for the job. If the drracket files are irrelevant for the swindle package then they shouldn't be in the swindle repository -- and on the exact same token, the development history of drracket shouldn't be there either. (This is not new, BTW, I think that there was general concensus right from the start of the package talk that the monolithic repo is just a host for a bunch of separate projects.) Okay, then let's purge the history of irrelevant files, but keep the history of relevant files even if they weren't in the right directory. If the monolithic repo is just a host for a bunch of separate projects, shouldn't it be possible to tease out their more-or-less separate histories? (*sigh*; please read the other email, where I went over this thoroughly.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
On Thu, May 23, 2013 at 7:09 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Carl Eastlund wrote: On Thu, May 23, 2013 at 6:57 AM, Eli Barzilay e...@barzilay.org wrote: A few minutes ago, Carl Eastlund wrote: It doesn't seem wrong to me. It's an accurate representation of the history of the project, which is exactly what git is for retaining. Where does the problem come from? The problem of filter-branch? It has no problems, it does exactly what it is supposed to do. It has no problems? Where above you stated this is exactly what you can't get with filter-branch in reference to keeping our packages' relevant history. Relevant history is vague. The thing that you can't do with filter-branch is keep the complete history if you remove files from the history -- the files that are gone go with their history. But filter-branch is not what I was talking about. I was talking about _not_ using filter-branch, and instead doing something that does keep history. Like I said: what you're suggesting means keeping the full monolithic history of developement in the main repo, including all of the irrelevant files (which will be removed in the tip, but included in the repo). If git filter-branch doesn't maintain the history we need, it's not the right tool for the job. If the drracket files are irrelevant for the swindle package then they shouldn't be in the swindle repository -- and on the exact same token, the development history of drracket shouldn't be there either. (This is not new, BTW, I think that there was general concensus right from the start of the package talk that the monolithic repo is just a host for a bunch of separate projects.) Okay, then let's purge the history of irrelevant files, but keep the history of relevant files even if they weren't in the right directory. If the monolithic repo is just a host for a bunch of separate projects, shouldn't it be possible to tease out their more-or-less separate histories? (*sigh*; please read the other email, where I went over this thoroughly.) I just went over all your emails on this topic, and I can't find a single one where you addressed this specific proposal at all. I don't know which one of us is misunderstanding another on this point. --Carl _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
Hi Eli: I'm trying to understand your point. Do I have this right? Background: The git history consists of a series checkpoints in time of the entire repository, not a collection of individual files. So, when I do git log x.rkt then what I get is essentially a filtered list (except where people didn't properly rebase, but lets ignore that) of those checkpoints: all the ones where x.rkt changed. Big Question: The issue is, then, when we split up the current repo into smaller repos, what are the series of checkpoints that we're going to make up for the individual repos? Right? Your Advice: And, IIUC, you're suggesting that the best way to deal with this question is to defer it until we are more sure of the actual split we want to make. So we don't mess with the history at all and instead just work at the level of some script that we can run to just use mv and company to move things around. When we know exactly what ends up going where, then we can figure out how to make up a new, useful history for the separate repositories. Is that the point? Robby On Thu, May 23, 2013 at 4:41 AM, Eli Barzilay e...@barzilay.org wrote: 9 hours ago, Matthew Flatt wrote: At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote: That's true, but the downside of changing the structure and having files and directories move post structure change will completely destroy the relevant edit history of the files, since it will not be carried over to the repos once it's split. It's possible that we're talking past each other due to me not getting this point. (Obligatory re-disclaimer: I consider the problem with forcing people to change their working environment much more severe.) Why is it not possible to carry over history? The history I want corresponds to `git log --follow' on each of the files that end up in a repository. I'm pretty sure that such a history of commits can be generated for any given set of files, even if no ready-made tool exists already (i.e., 'git' is plenty flexible that I can script it myself). Or maybe I'm missing some larger reason? The thing to remember is just how simple git is... There's no magical way to carry over a history artificially -- it's whatever is in the commits. To make this more concrete (and more verbose), in this context the point is that git filter-branch is a simple tool that basically replays the complete history, allowing you to plant various hooks to change the directory structure, commit messages or whatever. The new history is whatever new commits are in the revised repository, with no way to make up a history with anything else. Now, to make my first point about the potential loss of history that is inherent in the process -- say that you want to split out a drracket repo in a naive way: taking just that one directory. Since it's done naively, the resulting repository will not have the drscheme directory and its contents, which means that you lose all history of files that happened there. To try that (in a fresh clone, of course) -- first, look at the history of a random file in it: F=collects/drracket/private/app.rkt git log --format='%n%h %s' --name-only --follow -- $F Now do the revision: S=collects/drracket git filter-branch --prune-empty --subdirectory-filter $S -- --all And look at the same log line again, the history is gone: git log --format='%n%h %s' --name-only --follow -- $F If you look at the *new* file, you do see the history, but the revisions made in drscheme are gone: git log --format='%n%h %s' --name-only --follow -- private/app.rkt In any case, this danger is there no matter what, especially in our case since code has been moving around in the racket switch. I *hope* that most of it will be simple: like carrying along the drscheme directory with drracket, the scheme and mzlib with racket, etc. Later on, if these things move to compat packages, the irrelevant directories get removed from the repo without surgeries, so the history will still be there. This shows some of the tricks that might be involved in the current switch: if you'd want to have some compat package *now*, the right thing to do would be: * do a simple filter-branch to extract drscheme (and other such collections) in a new repository for compat * for drracket: do a filter-branch that keeps *both* directories in, then commit a removal of drscheme. (Optionally, use rebase to move the deletion backward...) Going back to the repo structure change that you want and the reason that I said that doing moves between the package directories post-restructure is destructive should be clear now: say that you move collects/A/x into foo/A/x as part of the restructure. Later you realize that A/x should go into the bar package instead so you just move it to bar/A/x. The history is now in, including the rename, but later on when bar is split into a separate
Re: [racket-dev] proposal for moving to packages: repository
At Thu, 23 May 2013 07:09:17 -0400, Eli Barzilay wrote: Relevant history is vague. The history I want corresponds to `git log --follow' on each of the files that end up in a repository. The thing that you can't do with filter-branch is keep the complete history if you remove files from the history -- the files that are gone go with their history. That's true if you use `git filter-branch' in a particular way. I'll suggest an alternative way, which involves filtering the set of files in a commit-specific way. That is, the right set of files to keep for each commit are not the ones in the final place, but the ones whose history we need at each commit. To make sure I'm not confused, I've implemented this idea. My implementation is unlikely to be exactly right, yet, but I think it works as a proof of concept. The enclosed slice.rkt script takes a subdirectory and a destination directory. Run it in the top directory of a git repository, and it finds all the files in the given subdirectory, and then it closes over the history of each file via `git log --follow'. From that point, we could use the computed set of paths as the ones to keep during a `git filter-branch' on every commit, but that's not ideal. For example, a file in collection a that is destined for package a may have originated in b (think mzlib), where the same-named file sticks around in b after the copy. It's nicer and cleaner to have irrelevant files disappear after the relevant copy/move is made. So, I took one more step: slice.rkt constructs a range of commits during which the file should exist, based on when it was moved or copied. (Forks and merges are a minor obstacle, which the script works around by enlarging ranges to hit commits in the `--first-parent' traversal.) Conceptually, the result is a mapping from commit ids to paths, but that would be a big table to read on every `filter-branch' step, so it's reported as a table of commits with enter/leave transitions. The output of slice.rkt is files: state.rktd for the set of files to be kept in the initial commit, and actions.rktd to specify the transitions. The enclosed prune.rkt script works with `git filter-branch --index-filter'. It uses actions.rktd (read-only) and state.rktd (which it updates via transitions). The Racket git repo is large, so I've only tried the `git filter-branch' step so far on smaller repos, such as the iplt repository. In my clone of iplt, I `git mv'ed web/internal to ex/internal. Then, with the scripts in /tmp, racket /tmp/slice.rkt ex /tmp git filter-branch --index-filter racket /tmp/prune.rkt /tmp --prune-empty leaves the repo with only the files of ex, and `git log --follow' on various files looks right. I'll try on a clone of the Racket repo and report back. FWIW, before doing this for real, I'd want to add a `--msg-filter' that extends each commit message to add the original commit id, since we have references to the old ids in various places (and so it would be handy to have them in the new repos). slice.rkt Description: Binary data prune.rkt Description: Binary data _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
Yesterday, Matthew Flatt wrote: We already have a system for constructing a script that can move files around and adjust content as needed: git. The script that I'm talking about *would* be in the repository, of course. It will essentially become a replacement for the distribution specs -- with the following differences: * Much less sophisticated, since it'll be just verbatim paths * Enforced via a package-aware build. * Easily translated into a git operation to split the monolithic repo. And with all of that, it is a truly gradual change -- allowing work on the package front to proceed without disturbing anyone's work environment until the repositories are physically split. As long as some of us are trying to write that script while others are changing the existing directories and files, there will be collisions. That's true, but the downside of changing the structure and having files and directories move post structure change will completely destroy the relevant edit history of the files, since it will not be carried over to the repos once it's split. Meta-note: I'm not arguing this as something that I strongly care about personally. I'm fine with nuking the whole history and start from fresh repositories post-split. I'm just trying to make the damage explicit for those who do care about keeping that history. In addition, I'm trying to make the move to packages as painless as possible for people -- your suggestion introduces three big changes: (a) structure change, (b) packages, (c) repository+structure change; and my suggestion eliminates (a), and a large part of (c) which will be a byproduct of (a). The reason that I think it makes more sense is that it allows package-based builds to start as soon as possible (even now, if the build is working with it), without waiting for anyone to adapt anything. I want to minimize conflicts and maximize the number of people who can help refine the package structure. The only point of loss that I see is the equivalence of the check-dists as a test in drdr -- but even that is completely minor, since drdr itself would also switch to package-based builds, and therefore dependency problems would still get reported by drdr. What other conflicts (ones that won't be detected by nightly or drdr builds) do you see? I think a lot of people on this list are eager to contribute to the shift into packages. As someone close to the new structure, I'm telling you my best guess at how you can help and in be in a position to help more: let us switch the repo sooner rather of later. As another meta-point: I'm probably at the top 2% of eagerness to switch. The current distribution thing is full of stuff that I would be very happy to see gone; the package-level dependency problems are things that I have been complaining about for years (and usually I'd be the only one to do so, and get some weak support only after huge emails trying to explain the future damage). In addition to that, back when the general direction was to keep the single repository as a place for all of the main package sources I sighed at the prospect of having the distribution-spec linger on as a specification of package splitting -- and I preffered to move into a split-by-directory structure to simplify things; so the move to separate repositories is something that is way more appealing to me. In short, I *very* much want this to happen, and I want it to happen as soon as possible. And this is exactly why I've made this suggestion: it allows an immediate switch. No need for any kind of convincing or discussion. As long as people agree on the end result of splitting into repositories, the package work continues as planned, unstoppable and undelay-able by people who are not dealing with packages. (And as a side note: even in the imaginary case that eventually there's some anti-package or anti-repo-split revolution, nothing is lost, since the result is still a better build + distribution process.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
Yesterday, Eric Dobson wrote: On Tue, May 21, 2013 at 4:29 AM, Jay McCarthy jay.mccar...@gmail.com wrote: In my tree, I have 20M of compiled code and 13M of source. I like the idea of a reduction of about 50% in size of downloads. I'm not sure if something on the order of 10M is something to worry about optimizing, that takes like 5-6 seconds to download on a 15Mbit connection. And a minute on a much slower connection. I don't know how Jay got those numbers, but I have a very different picture: 363M Current installed tree 278M No-source tree (with docs) 56M Installed textual tree (has no docs and scrbl files) 42M Same minus sources If a package based installation is roughly like the textual thing, and given that it's easy to extend it to a full installation by adding packages, then we're talking about going from a 363M tree down to a 42M thing. I think that the minimal core racket would be even smaller than the textual thing: once I remove things that look like they shouldn't be there, it goes down to 28M. The impact of having a huge tree currently is pretty big, IMO. One example is that it is impractical to have random linux utilities implemented in Racket if you need to drag in a 363M working environment. It's true that you could in theory use the textual thing, but the monolithic tree makes it hard for linux distro packagers to split things into a small core -- hard enough that nobody did it so far. Another example is the few brave people who tried to make things work on small devices, which usually starts with a huge effort to get rid of unnecessary stuff. Finally -- consider J. Random User -- installing a 360M thing on your computer is something that you'd worry about much more than a 28M thing. The smaller thing is at a point where you won't worry about it beind left somewhere, and at a point where it's fine to installed as a kind of a shared runtime thing for someone who wants to distribute racket-based applications. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote: That's true, but the downside of changing the structure and having files and directories move post structure change will completely destroy the relevant edit history of the files, since it will not be carried over to the repos once it's split. It's possible that we're talking past each other due to me not getting this point. Why is it not possible to carry over history? The history I want corresponds to `git log --follow' on each of the files that end up in a repository. I'm pretty sure that such a history of commits can be generated for any given set of files, even if no ready-made tool exists already (i.e., 'git' is plenty flexible that I can script it myself). Or maybe I'm missing some larger reason? _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
On Wed, May 22, 2013 at 8:21 PM, Matthew Flatt mfl...@cs.utah.edu wrote: At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote: That's true, but the downside of changing the structure and having files and directories move post structure change will completely destroy the relevant edit history of the files, since it will not be carried over to the repos once it's split. It's possible that we're talking past each other due to me not getting this point. Why is it not possible to carry over history? The history I want corresponds to `git log --follow' on each of the files that end up in a repository. I'm pretty sure that such a history of commits can be generated for any given set of files, even if no ready-made tool exists already (i.e., 'git' is plenty flexible that I can script it myself). Or maybe I'm missing some larger reason? I was going to comment on the same thing. While a naive use of git filter-branch might not retain the history, it should be entirely possible to do something a little more intelligent and keep that history. Essentially each of the new repositories could keep the entire history of the original repository, followed by a massive move/rename, then moving forward with an individual package. --Carl _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
I agree that 363 to 28 would be a great win. But you seem to be describing the difference between Full Racket and core racket, not the difference between binary and source. For binary vs source, I think you are providing a good argument for the usefulness of a no source distribution. Some people want to use tools written in Racket, and the fact that the tools are written in Racket is immaterial to them. They should be able to have just the binary versions. On Wed, May 22, 2013 at 12:30 PM, Eli Barzilay e...@barzilay.org wrote: Yesterday, Eric Dobson wrote: On Tue, May 21, 2013 at 4:29 AM, Jay McCarthy jay.mccar...@gmail.com wrote: In my tree, I have 20M of compiled code and 13M of source. I like the idea of a reduction of about 50% in size of downloads. I'm not sure if something on the order of 10M is something to worry about optimizing, that takes like 5-6 seconds to download on a 15Mbit connection. And a minute on a much slower connection. I don't know how Jay got those numbers, but I have a very different picture: 363M Current installed tree 278M No-source tree (with docs) 56M Installed textual tree (has no docs and scrbl files) 42M Same minus sources If a package based installation is roughly like the textual thing, and given that it's easy to extend it to a full installation by adding packages, then we're talking about going from a 363M tree down to a 42M thing. I think that the minimal core racket would be even smaller than the textual thing: once I remove things that look like they shouldn't be there, it goes down to 28M. The impact of having a huge tree currently is pretty big, IMO. One example is that it is impractical to have random linux utilities implemented in Racket if you need to drag in a 363M working environment. It's true that you could in theory use the textual thing, but the monolithic tree makes it hard for linux distro packagers to split things into a small core -- hard enough that nobody did it so far. Another example is the few brave people who tried to make things work on small devices, which usually starts with a huge effort to get rid of unnecessary stuff. Finally -- consider J. Random User -- installing a 360M thing on your computer is something that you'd worry about much more than a 28M thing. The smaller thing is at a point where you won't worry about it beind left somewhere, and at a point where it's fine to installed as a kind of a shared runtime thing for someone who wants to distribute racket-based applications. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
I've been using using Racket (and DrRacket) to teach programming to architecture students. These are not sophisticated users, so any move that makes it more difficult for them to use Racket is not good news. What happened to the batteries included motto? Just my 0.1 cents. Best, António. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Mon, May 20, 2013 at 2:23 PM, Jose A. Ortega Ruiz j...@gnu.org wrote: Here's hope that down the line there'll be binary+source packages that end users can install with the same ease as today. Matthew's email mentioned this a little, but the plan is that: $ raco pkg install drracket will install source as well as binaries. The big change is that the distribution you get from http://racket-lang.org/download/ won't include all of that stuff. Sam _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
On Mon, May 20, 2013 at 6:07 PM, Matthew Flatt mfl...@cs.utah.edu wrote: To put it another way and overstate a little: I'm trying to get buy-in from dev to make the switch to packages wholesale. The little bit of staging in the plan is to make the conversion itself easier, and not to simplify the switch for developers. Can you spell out how the directory movement you described will make the conversion easier? Here's what I think the simplest move to multiple repositories would be: 1. Use `git filter-branch` to create a new repository for the drracket package from the current git repository. [1] 2-N. Repeat step 1 for all the other packages we plan to split out. N+1. Use `git rm` to remove everything that's been split out from the main repository. I think the key piece of information that makes this work is that `git filter-branch` lets you do the subdirectory manipulation that you seem to be planning to do manually. In particular, see the last example in the `git filter-branch` man page [2], which is about moving things to a subdirectory. For example, here's the `realm` collect split out, using `git filter-branch` twice: https://github.com/samth/realm-split The commands I used are here: https://gist.github.com/samth/5618014 [1] https://help.github.com/articles/splitting-a-subpath-out-into-a-new-repository [2] https://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html Sam _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Mon, May 20, 2013 at 11:20 PM, Juan Francisco Cantero Hurtado i...@juanfra.info wrote: On 05/20/13 23:24, Carl Eastlund wrote: On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu wrote: On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote: Eventually, when the dust settles, I think we'll want to convert every directory to its own git repo, and then we can incorporate the individual repos as git submodules. One nice thing about the current repo organization is that push notifications for every part of the PLT codebase go to all of the developers. Will that still be available in this organization scheme? (I don't care if it's opt-in too much, but opt-out will hopefully mean more eyes see the code) Cheers, Asumu Overall, I'm really glad to see Racket moving into the package system. I think it will be good for both (the Racket core and the package system). I'd like to mention, though, that git submodules can be a real pain for synchronizing development of multiple repositories. They seem to have been designed primarily for importing upstream repositories, rather than for multiple peer repositories. I'm not much more fond of the alternatives I have tried, either; if we're committing to splitting Racket into multiple repositories as well as multiple packages, we should be aware there may be another minor git learning curve ahead. Thanks to Jay and Matthew for working on all of this! I also think that git submodules are a bad idea for packages. One git repo per package is more simple and less problematic. Thanks for the hard work :) Git submodules imply one repo per package. A submodule is a mechanism that imports external repos into a checkout of a client repo, and records the specific commit of the checkout so that there is a correlation of the commits in each repo stored with the client. If we're going to use multiple repositories, we definitely need something like submodules in order to retain a shared commit history. --Carl _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Tue, May 21, 2013 at 12:16 AM, Antonio Menezes Leitao antonio.menezes.lei...@ist.utl.pt wrote: I've been using using Racket (and DrRacket) to teach programming to architecture students. These are not sophisticated users, so any move that makes it more difficult for them to use Racket is not good news. What happened to the batteries included motto? The new organization does not imply that you can't download one thing and get the core plus many packages. In fact, we intend to make it more flexible so that teachers could easily create a distribution for their class with the material they need (and not the stuff they don't... like textbooks in German.) Jay Just my 0.1 cents. Best, António. _ Racket Developers list: http://lists.racket-lang.org/dev -- Jay McCarthy j...@cs.byu.edu Assistant Professor / Brigham Young University http://faculty.cs.byu.edu/~jay The glory of God is Intelligence - DC 93 _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Mon, May 20, 2013 at 10:05 PM, Eric Dobson eric.n.dob...@gmail.com wrote: I'm not sure I follow on why binary packages make it easier to reduce dependencies between packages, or why binary packages offer faster installs. I'm guessing that binary packages prevent cyclic dependencies between packages, but it seems like there are many other options that still get this side effect. Such as explicit checks when building the package. If you have the source, then you need all the phase = 1 dependencies, but if you just have the binary then you only need the phase = 0 deps. Similarly, for building the documentation. For faster installs, the only benefit I see of binary packages over precompiled source packages is a small savings in size which doesn't seem like it would amount to much of the install time. In my tree, I have 20M of compiled code and 13M of source. I like the idea of a reduction of about 50% in size of downloads. However, the faster install point is really about the fact that users won't need to run raco setup and do the compilation/documentation build once they do the download of the source. Jay Can someone explain the claims for binary packages? On Mon, May 20, 2013 at 8:57 PM, Jon Zeppieri zeppi...@gmail.com wrote: On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote: [snip] Example: Imagine I'm in the middle of writing a Racket program and am wondering about characteristics of some kind of I/O port in Racket. With transparent source accessibility, I can just click on an identifier in my program in DrRacket to start browsing the implementation. Maybe I see a possible improvement, or seeing the source pre-empts yet another email list question that otherwise only Matthew could answer, or I feel empowered to go add a new feature. If the source is not as accessible, then I'm more likely to be a mere naive user of the tools, rather than to understand the tools and help improve them. +inf.0 Though the easiest way to make the source available is just to keep it in the distribution. I'll be sad to see it go. -Jon _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev -- Jay McCarthy j...@cs.byu.edu Assistant Professor / Brigham Young University http://faculty.cs.byu.edu/~jay The glory of God is Intelligence - DC 93 _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Tue, May 21, 2013 at 6:22 AM, Jay McCarthy jay.mccar...@gmail.comwrote: On Tue, May 21, 2013 at 12:16 AM, Antonio Menezes Leitao antonio.menezes.lei...@ist.utl.pt wrote: I've been using using Racket (and DrRacket) to teach programming to architecture students. These are not sophisticated users, so any move that makes it more difficult for them to use Racket is not good news. What happened to the batteries included motto? The new organization does not imply that you can't download one thing and get the core plus many packages. In fact, we intend to make it more flexible so that teachers could easily create a distribution for their class with the material they need (and not the stuff they don't... like textbooks in German.) I want to emphasize this point: there are no plans to change which libraries are included when you download Racket. All of our crazy set of batteries will still be included. Robby _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
At Tue, 21 May 2013 00:09:49 -0700, Sam Tobin-Hochstadt wrote: On Mon, May 20, 2013 at 6:07 PM, Matthew Flatt mfl...@cs.utah.edu wrote: To put it another way and overstate a little: I'm trying to get buy-in from dev to make the switch to packages wholesale. The little bit of staging in the plan is to make the conversion itself easier, and not to simplify the switch for developers. Can you spell out how the directory movement you described will make the conversion easier? I think we won't get an ideal package split on the first N tries, and it will be easier to move files and directories around in one repository (using `git mv') instead of among multiple repositories. When we finally have mostly the right split, then we can use `git filter-branch'. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
Jay McCarthy wrote: If you have the source, then you need all the phase = 1 dependencies, but if you just have the binary then you only need the phase = 0 deps. That's assuming that you want to run the source, but I think that the people who are arguing about still having the source available in the distribution are mostly interested in reading the source, in which case having only the source for the phase = 0 dependencies would probably be a good enough approximation... Philippe _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On 5/20/13 4:42 PM, Matthew Flatt wrote: I used to think that we'd take advantage of the package manager by gradually pulling parts out of the Racket git repo and making them packages. Now, I think we should just shift directly to a small-ish Racket core, making everything else a package immediately. Core means enough to run `raco pkg'. A key point to remember is that package does not mean omitted from the distribution. Instead, we'll construct a distribution by combining the core with a selected set of packages. Initially the selected set of packages will cover everything in the current distribution. Jay and I have been lining up the pieces for this change (it's difficult to make a meaningful proposal without trying a lot of the work, first), and I provide a sketch of the overall plan below. This plan has two prominent implications: * The current git repo's directory structure will change. Will this directory structure change have an impact on how modules are referenced? My biggest concern is the Realm of Racket book, which is about to come out. It sounds like this change could potentially cause a lot of confusion if it alters the collects organization. Thanks, David _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Tue, May 21, 2013 at 4:29 AM, Jay McCarthy jay.mccar...@gmail.com wrote: On Mon, May 20, 2013 at 10:05 PM, Eric Dobson eric.n.dob...@gmail.com wrote: I'm not sure I follow on why binary packages make it easier to reduce dependencies between packages, or why binary packages offer faster installs. I'm guessing that binary packages prevent cyclic dependencies between packages, but it seems like there are many other options that still get this side effect. Such as explicit checks when building the package. If you have the source, then you need all the phase = 1 dependencies, but if you just have the binary then you only need the phase = 0 deps. Similarly, for building the documentation. Like Philippe said a viewable source doesn't require this, only source that can be compiled. Whether or not we want to support that I don't know, but it seems like it should be possible. For faster installs, the only benefit I see of binary packages over precompiled source packages is a small savings in size which doesn't seem like it would amount to much of the install time. In my tree, I have 20M of compiled code and 13M of source. I like the idea of a reduction of about 50% in size of downloads. I'm not sure if something on the order of 10M is something to worry about optimizing, that takes like 5-6 seconds to download on a 15Mbit connection. And a minute on a much slower connection. However, the faster install point is really about the fact that users won't need to run raco setup and do the compilation/documentation build once they do the download of the source. Why would you need to run raco setup if the source was already precompiled? Also how well does the source compress compared to compiled code? Jay Can someone explain the claims for binary packages? On Mon, May 20, 2013 at 8:57 PM, Jon Zeppieri zeppi...@gmail.com wrote: On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote: [snip] Example: Imagine I'm in the middle of writing a Racket program and am wondering about characteristics of some kind of I/O port in Racket. With transparent source accessibility, I can just click on an identifier in my program in DrRacket to start browsing the implementation. Maybe I see a possible improvement, or seeing the source pre-empts yet another email list question that otherwise only Matthew could answer, or I feel empowered to go add a new feature. If the source is not as accessible, then I'm more likely to be a mere naive user of the tools, rather than to understand the tools and help improve them. +inf.0 Though the easiest way to make the source available is just to keep it in the distribution. I'll be sad to see it go. -Jon _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev -- Jay McCarthy j...@cs.byu.edu Assistant Professor / Brigham Young University http://faculty.cs.byu.edu/~jay The glory of God is Intelligence - DC 93 _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On 05/21/13 12:21, Carl Eastlund wrote: On Mon, May 20, 2013 at 11:20 PM, Juan Francisco Cantero Hurtado i...@juanfra.info wrote: On 05/20/13 23:24, Carl Eastlund wrote: On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu wrote: On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote: Eventually, when the dust settles, I think we'll want to convert every directory to its own git repo, and then we can incorporate the individual repos as git submodules. One nice thing about the current repo organization is that push notifications for every part of the PLT codebase go to all of the developers. Will that still be available in this organization scheme? (I don't care if it's opt-in too much, but opt-out will hopefully mean more eyes see the code) Cheers, Asumu Overall, I'm really glad to see Racket moving into the package system. I think it will be good for both (the Racket core and the package system). I'd like to mention, though, that git submodules can be a real pain for synchronizing development of multiple repositories. They seem to have been designed primarily for importing upstream repositories, rather than for multiple peer repositories. I'm not much more fond of the alternatives I have tried, either; if we're committing to splitting Racket into multiple repositories as well as multiple packages, we should be aware there may be another minor git learning curve ahead. Thanks to Jay and Matthew for working on all of this! I also think that git submodules are a bad idea for packages. One git repo per package is more simple and less problematic. Thanks for the hard work :) Git submodules imply one repo per package. A submodule is a mechanism that imports external repos into a checkout of a client repo, and records the specific commit of the checkout so that there is a correlation of the commits in each repo stored with the client. If we're going to use multiple repositories, we definitely need something like submodules in order to retain a shared commit history. You're right. I was thinking in git subtree. Sorry for the confusion. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
At Tue, 21 May 2013 10:46:29 -0400, David Van Horn wrote: On 5/20/13 4:42 PM, Matthew Flatt wrote: This plan has two prominent implications: * The current git repo's directory structure will change. Will this directory structure change have an impact on how modules are referenced? The package system is designed to separate the way that modules are referenced from the way that they are installed. Whether the module `realm/chapter10/source' is part of the core, installed by the user as a package, or included as an pre-installed package in a distribution, a reference to the module within a program is always `(require realm/chapter10/source)'. A reference to the module of the form look in the 'collects' directory's 'realm' subdirectory, however, would be broken by the directory-structure change, and we'd have to do extra work to manage that (such as keeping a note in the core or special-cased distributions to point to the new path). _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
At Tue, 21 May 2013 05:29:19 -0600, Jay McCarthy wrote: If you have the source, then you need all the phase = 1 dependencies, but if you just have the binary then you only need the phase = 0 deps. That's the right idea, but not precisely correct. If you `(require (for syntax ...))' a module, then the module is still needed at run time, because it might have a `(require (for-template ...))', and so on. A modules referenced though `lazy-require' in a `for-syntax' import, however, could conceivably be omitted. For example, a large part of the Typed Racket compiler might be omitted as a run-time dependency for a Typed Racket program. We're not quite to the place where that will work out well, but I think we'll get there. Similarly, for building the documentation. That's really the big one in the short run, I think. It's difficult to have anything small and still have Racket-style documentation. At Tue, 21 May 2013 08:10:02 -0700, Eric Dobson wrote: Why would you need to run raco setup if the source was already precompiled? It's easy to underestimate the complexity of `raco setup'. Indeed, if every `raco setup' started from scratch, it would be pretty easy. Instead, `raco setup' has to perform an incremental computation based on an inferred set of filesystem changes, where the computation to incrementalize includes bytecode compilation, document rendering, document database cross-referencing, path adjustments, and more --- and it's all supposed to work in parallel, it's not supposed to leave things in a bad state if it gets interrupted, it should recover from most any state including bad states inadvertently created by novice programmers, it's supposed to support shared non-writable parts and user-specific writable parts, it's supposed to support PLTCOLLECTS and PLTCOMPILEDROOTS, and it's supposed to have a dozen other properties that I'm forgetting at the moment. To answer the specific question, one reason you need to run `raco setup' on a precompiled collection to fix up the documentation cross-reference database and references, get libraries and launchers in place, and perform whatever install-time actions the package wants. Yes, we can make `raco setup' work with packages that contain both source and binaries, and I guess I'll go work on that instead of other directions. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
Yesterday, Matthew Flatt wrote: Concretely, new repositories that are just a subset of the current repo would be off-by-one in directory structure compared to a normal package. Each package should correspond to a subtree starting from the collects level, not the parent of collects. We could massage the two views into one, but I'd rather not. That's really easy to deal with, and doesn't contradict what I suggested, *but* given: To put it another way and overstate a little: I'm trying to get buy-in from dev to make the switch to packages wholesale. [...] And even more, given: 5 hours ago, Matthew Flatt wrote: I think we won't get an ideal package split on the first N tries, and it will be easier to move files and directories around in one repository (using `git mv') instead of among multiple repositories. When we finally have mostly the right split, then we can use `git filter-branch'. I think that there's a much easier and more elegant way to do this, which is even easier for all developers. Roughly speaking, it's flipping what I suggested yesterday and doing it the other way: * Keep the repository as-is, no structural changes at all. * Keep working on things as usual, including work on the package system and everything that is related. * As it gets to a workable state, keep a script that will *split* the monolithic repo into separate packages. This script can start very simple, for example, a naive thing would be: cd $MAINTREE mkdir $PACKAGES/drracket mv collects/drracket collects/drscheme $PACKAGES/drracket Everything that deals with packages would start from a fresh main repo and and empty package directory, and will construct the packages from it. So, for example, the build will still make each package independently, and distribution is still done by assembling packages. * The main point is related to what you said above: the package splittage is determined by the script, so if you find out that some file belongs in a different package, or that packages need to be combined, or split differently, or whatever -- this is all done by just changing the script. So you get two birds with a single stone: it's easy to experiment freely in the early stages, and it's easy to adjust things when the split converges to something that works fine. * When everything is working smoothly -- with the main effect being a resolution of dependencies, both of existing code and in terms of people being aware of them -- at this point it will be a good time to switch to separate repos, and since all developers have already gotten used to the package, there is now just the repo change, and nothing else -- so it becomes a technical point like switching from svn to git, not piled up on the more substantial change. As a side-effect, the final directory-splitting script can be used with git's filter-branch to create the new repos. I think that this offers the best in terms of being flexible as needed while work is in progress, and separating the changes that people need to adjust too which should make the whole process more comfortable. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
[keeping the different subject since this is still about the repo.] Yesterday, Asumu Takikawa wrote: One nice thing about the current repo organization is that push notifications for every part of the PLT codebase go to all of the developers. Will that still be available in this organization scheme? (I don't care if it's opt-in too much, but opt-out will hopefully mean more eyes see the code) This is easy both in our git server (it's easy to have a shared configuration so all of them get the same notifications, and bug-fix-messages are caught in all of them), and in github (where you'll need to watch all of them). Yesterday, Carl Eastlund wrote: I'd like to mention, though, that git submodules can be a real pain for synchronizing development of multiple repositories. They seem to have been designed primarily for importing upstream repositories, rather than for multiple peer repositories. Two points about submodules: 1. My impression is that they have improved a *lot* in the past ~2 years or so. Not only in terms of better functionality, but also in terms of convenience of using them. 2. If things go the way I suggested in the other email, then there's no real need to use submodules. You need to have these repositories somewhere if you want to work on them (or a subset if you work on only some of them) -- and you should be able to get them any way you want. There's no reason for the core repository to come with submodule points for all of the packages. I think that it might makes sense to keep some meta repository for people who want a convenient checkout of all packages -- but if you don't like submodules, you just don't use it. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
We already have a system for constructing a script that can move files around and adjust content as needed: git. As long as some of us are trying to write that script while others are changing the existing directories and files, there will be collisions. We won't come up with a scripting system that handles those collisions better than git. I want to minimize conflicts and maximize the number of people who can help refine the package structure. We all know how to use git to script changes to the repo, and we know how to work with a shared repo to make conflicts manageable. That's why I'm asking that we all change together to a new repo structure. I think a lot of people on this list are eager to contribute to the shift into packages. As someone close to the new structure, I'm telling you my best guess at how you can help and in be in a position to help more: let us switch the repo sooner rather of later. Then, everyone will be in a good position to script progress in various ways. At Tue, 21 May 2013 14:20:33 -0400, Eli Barzilay wrote: Yesterday, Matthew Flatt wrote: Concretely, new repositories that are just a subset of the current repo would be off-by-one in directory structure compared to a normal package. Each package should correspond to a subtree starting from the collects level, not the parent of collects. We could massage the two views into one, but I'd rather not. That's really easy to deal with, and doesn't contradict what I suggested, *but* given: To put it another way and overstate a little: I'm trying to get buy-in from dev to make the switch to packages wholesale. [...] And even more, given: 5 hours ago, Matthew Flatt wrote: I think we won't get an ideal package split on the first N tries, and it will be easier to move files and directories around in one repository (using `git mv') instead of among multiple repositories. When we finally have mostly the right split, then we can use `git filter-branch'. I think that there's a much easier and more elegant way to do this, which is even easier for all developers. Roughly speaking, it's flipping what I suggested yesterday and doing it the other way: * Keep the repository as-is, no structural changes at all. * Keep working on things as usual, including work on the package system and everything that is related. * As it gets to a workable state, keep a script that will *split* the monolithic repo into separate packages. This script can start very simple, for example, a naive thing would be: cd $MAINTREE mkdir $PACKAGES/drracket mv collects/drracket collects/drscheme $PACKAGES/drracket Everything that deals with packages would start from a fresh main repo and and empty package directory, and will construct the packages from it. So, for example, the build will still make each package independently, and distribution is still done by assembling packages. * The main point is related to what you said above: the package splittage is determined by the script, so if you find out that some file belongs in a different package, or that packages need to be combined, or split differently, or whatever -- this is all done by just changing the script. So you get two birds with a single stone: it's easy to experiment freely in the early stages, and it's easy to adjust things when the split converges to something that works fine. * When everything is working smoothly -- with the main effect being a resolution of dependencies, both of existing code and in terms of people being aware of them -- at this point it will be a good time to switch to separate repos, and since all developers have already gotten used to the package, there is now just the repo change, and nothing else -- so it becomes a technical point like switching from svn to git, not piled up on the more substantial change. As a side-effect, the final directory-splitting script can be used with git's filter-branch to create the new repos. I think that this offers the best in terms of being flexible as needed while work is in progress, and separating the changes that people need to adjust too which should make the whole process more comfortable. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote: Eventually, when the dust settles, I think we'll want to convert every directory to its own git repo, and then we can incorporate the individual repos as git submodules. One nice thing about the current repo organization is that push notifications for every part of the PLT codebase go to all of the developers. Will that still be available in this organization scheme? (I don't care if it's opt-in too much, but opt-out will hopefully mean more eyes see the code) Cheers, Asumu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu wrote: On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote: Eventually, when the dust settles, I think we'll want to convert every directory to its own git repo, and then we can incorporate the individual repos as git submodules. One nice thing about the current repo organization is that push notifications for every part of the PLT codebase go to all of the developers. Will that still be available in this organization scheme? (I don't care if it's opt-in too much, but opt-out will hopefully mean more eyes see the code) Cheers, Asumu Overall, I'm really glad to see Racket moving into the package system. I think it will be good for both (the Racket core and the package system). I'd like to mention, though, that git submodules can be a real pain for synchronizing development of multiple repositories. They seem to have been designed primarily for importing upstream repositories, rather than for multiple peer repositories. I'm not much more fond of the alternatives I have tried, either; if we're committing to splitting Racket into multiple repositories as well as multiple packages, we should be aware there may be another minor git learning curve ahead. Thanks to Jay and Matthew for working on all of this! --Carl _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Mon, May 20 2013, Matthew Flatt wrote: [...] Some drawbacks to omitting source are immediately apparent: - Users will be less able to make source changes on their systems to help us debug. Having the binary form of a package installed does not preclude upgrading to a source package. So, we could ask a user to use the package manager to install the source form of, say, the drracket package, and then try out a change. In that way, users can still help, but it will be less convenient. - Users will be less able to read installed code as examples. Our source code is now easily available via the web interfaces at http://git.racket-lang.org/ and GitHub, so users can always look there, instead. FWIW (and i know it's not much, but anyway), this will be a big loss for Geiser users, who right know can jump to any core function source with a single keystroke and without leaving the editor. IME, there's a huge difference between that and having to switch to a web browser to find it, both when learning or programming new applications. Here's hope that down the line there'll be binary+source packages that end users can install with the same ease as today. Cheers, jao -- Nostalgia isn’t what it used to be. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
An hour and a half ago, Matthew Flatt wrote: I used to think that we'd take advantage of the package manager by gradually pulling parts out of the Racket git repo and making them packages. (Generally, +1. I'll reply just on the repository point here.) This plan has two prominent implications: * The current git repo's directory structure will change. [...] I very strongly object to this. While in theory git will follow everything, this requires doing some more work which most people won't know about, so a result of all of this is going to be loss of historical information. So I think that it's much better to move directly to several repositories (IIUC, one repository for each suggested toplevbel directory). The only goal of the intermediate state seems to be providing some gradual change before switching to submodules -- and on one hand, I think that the new layout will force people to learn how to deal with it, and on the other, it'll make people spend work twice, once on the layout change and again on the switch to modules. So assuming that a gradual change is the goal, I think that there are better ways to do that. Here's a suggestion: * The main repository is split into the different repositories. Initially, this is done without any consideration for submodules, with the idea of having advanced gitters come up with their own solutions. * However, don't remove the main repository, just keep it as an aggregate of the content that is found in the split repositories. If the structure is going to be the same in all of them (ie, the same directories and files are in all as they are now in the single repository), then pulling changes from the new repos to the main one is going to be trivial to the point of being automated. * The new repos will not get mirrored on github. This is because github repos come with a bunch of functionality that is best kept in a single place -- like wiki pages and issues. (But see below.) * So the only difference would be for people who commit work to the main repo. This can be done in various ways, depending on the developers who do these commits: - Advanced developers would have all of the repos and will push directly to them. This group of people is likely to start small, and evenetually have all of the core committers in it. (Core as in the people who push to the plt repo now.) As I said above, this will likely involve some experimentation for these people, which will later get translated into easy setups that will allow more people to switch to it. - Outsiders can continue to work as usual: fork the main plt repo (mostly on github) and send pull requests. The pull request will then be pushed by a core committer as it is done now, where the core committer pushes to the actual relevant repo, and that eventually propagates back to the main repo so that the contributor sees that the work was merged. The merging should usually be trivial, except in extremely rare cases where the push touches on files from different new repos. In these cases it should be possible to either split the commit into different ones for the different repos, or ask the contributor to split the commit to different ones for the different files. - The only people left are core committers who will work with the main repository. I can see a bunch of ways to deal with this. First, the commit can be sent as a pull request to one of the advanced gitters who will then do it for the actual repository. This is easier than it sounds: git has a bunch of commands to do this, and for all practical purposes, you'd just replace the git push part of your workflow with git send-email. I *think* (but I'm not 100% sure) that this work can be automated too, so it's fine if I (or some other excited soul) gets these emails and merges them. There is an inconvenience point here: once you send a pull request and its merged, the actual commits that are merged (to the main repo, which you're using if you're in this group) are different objects. This is nothing new -- it's something that people who do all contibutions via pull requests deal with, since we have a policy of rebasing rather than merging. Usually, when you pull from the update repo, git should notice that your changes are already there. (At least I hope it does.) Things will be less convenient for people who use git more intensly: if you have lots of branches etc. But I think that such people really should just move to the first group sooner... * This stage can go on for a while, as the code machinery involved evolves to a point of being smooth enough. By smooth, I mean that - it be easy enough to build the whole thing as you do now, -
Re: [racket-dev] proposal for moving to packages
Well, ideally there would be some new module-name-source function that could return URIs like http://path/to/file.rkt (or for that matter, file:///path/to/file.rkt), based on info.rkt for packages? Given that piece, a couple ways to do it -- favoring doing it more in Emacs vs. more in Racket -- but both involve having a local cache, and also using If-Modified-Since request headers? Maybe even the ability to prefill the cache and never expire it ... which seems awfully like source installation by other means? p.s. An approach favoring doing it more on the Racket side than on the Emacs side, could also support FRs like one I saw on the main list recently, which is that File | Open in DrRacket should be able to open remote files. That was for a classroom setting IIRC. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages: repository
At Mon, 20 May 2013 18:27:34 -0400, Eli Barzilay wrote: An hour and a half ago, Matthew Flatt wrote: This plan has two prominent implications: * The current git repo's directory structure will change. [...] I very strongly object to this. While in theory git will follow everything, this requires doing some more work which most people won't know about, so a result of all of this is going to be loss of historical information. So I think that it's much better to move directly to several repositories (IIUC, one repository for each suggested toplevbel directory). The only goal of the intermediate state seems to be providing some gradual change before switching to submodules -- and on one hand, I think that the new layout will force people to learn how to deal with it, and on the other, it'll make people spend work twice, once on the layout change and again on the switch to modules. So assuming that a gradual change is the goal, I think that there are better ways to do that. It's about a kind of gradual change, but not quite so gradual. I would like to switch immediately to a package-oriented view of Racket, instead of thinking about packages as something that you get by squinting at our current tree. Concretely, new repositories that are just a subset of the current repo would be off-by-one in directory structure compared to a normal package. Each package should correspond to a subtree starting from the collects level, not the parent of collects. We could massage the two views into one, but I'd rather not. At the time time, I agree that it's tricky to properly extract history for the new repositories, and there will be many issues in dealing with multiple repositories (e.g., submodules may not be the way to go). So, I'd like to delay that part until a second step. To put it another way and overstate a little: I'm trying to get buy-in from dev to make the switch to packages wholesale. The little bit of staging in the plan is to make the conversion itself easier, and not to simplify the switch for developers. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
I'm calling for making Racket and package source transparently accessible, even though not actually bundled into distribution downloads... Racket has a research and education bent, and also attracts some of the more sophisticated developers. For all of these audiences, there's a tradition of accessibility of source, and arguably value in that. I think transparent navigability to source would be appropriate for Racket. Transparent navigability to source could mean that DrRacket will download source on-demand for any binary package that is installed, rather than source having to be bundled with the package, or requiring user to go get source separately. Admittedly, I think source accessibility is not as important in Racket as in Emacs. (Because, for general programming, the Racket documentation is sufficient and the source wouldn't help. And for extension of the programming environment, which was one of Emacs's greatest achievements, extending DrRacket is much harder; plus, the DrRacket source is not much help if you didn't previously tackle the manuals on frameworks and such, which almost no one does.) But there are uses for source accessibility, especially for independent add-on packages, and the principle of being able to easily pop the hood still has value. Example: Imagine I'm in the middle of writing a Racket program and am wondering about characteristics of some kind of I/O port in Racket. With transparent source accessibility, I can just click on an identifier in my program in DrRacket to start browsing the implementation. Maybe I see a possible improvement, or seeing the source pre-empts yet another email list question that otherwise only Matthew could answer, or I feel empowered to go add a new feature. If the source is not as accessible, then I'm more likely to be a mere naive user of the tools, rather than to understand the tools and help improve them. Side note: I'm also looking forward to seeing how this new packaging works out, especially if it leads to me being able to ship small binary packages for iPhone/Mac/Windows, implemented in Racket. (I don't care about open source principles on those very closed platforms; I just want their money. Which is totally different from what I want from an intellectually-inclined open source development platform.) Neil V. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On 05/20/13 23:24, Carl Eastlund wrote: On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu wrote: On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote: Eventually, when the dust settles, I think we'll want to convert every directory to its own git repo, and then we can incorporate the individual repos as git submodules. One nice thing about the current repo organization is that push notifications for every part of the PLT codebase go to all of the developers. Will that still be available in this organization scheme? (I don't care if it's opt-in too much, but opt-out will hopefully mean more eyes see the code) Cheers, Asumu Overall, I'm really glad to see Racket moving into the package system. I think it will be good for both (the Racket core and the package system). I'd like to mention, though, that git submodules can be a real pain for synchronizing development of multiple repositories. They seem to have been designed primarily for importing upstream repositories, rather than for multiple peer repositories. I'm not much more fond of the alternatives I have tried, either; if we're committing to splitting Racket into multiple repositories as well as multiple packages, we should be aware there may be another minor git learning curve ahead. Thanks to Jay and Matthew for working on all of this! I also think that git submodules are a bad idea for packages. One git repo per package is more simple and less problematic. Thanks for the hard work :) _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
Juan Francisco Cantero Hurtado wrote at 05/20/2013 11:20 PM: I also think that git submodules are a bad idea for packages. One git repo per package is more simple and less problematic. Do people expect to often do commits involving changes across these package boundaries? If so, would another option be to keep a single repo, not use these Git submodules, and just have Racket translate the Git paths behind-the-scenes for packages coming from this core Racket repo? Neil V. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote: [snip] Example: Imagine I'm in the middle of writing a Racket program and am wondering about characteristics of some kind of I/O port in Racket. With transparent source accessibility, I can just click on an identifier in my program in DrRacket to start browsing the implementation. Maybe I see a possible improvement, or seeing the source pre-empts yet another email list question that otherwise only Matthew could answer, or I feel empowered to go add a new feature. If the source is not as accessible, then I'm more likely to be a mere naive user of the tools, rather than to understand the tools and help improve them. +inf.0 Though the easiest way to make the source available is just to keep it in the distribution. I'll be sad to see it go. -Jon _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] proposal for moving to packages
I'm not sure I follow on why binary packages make it easier to reduce dependencies between packages, or why binary packages offer faster installs. I'm guessing that binary packages prevent cyclic dependencies between packages, but it seems like there are many other options that still get this side effect. Such as explicit checks when building the package. For faster installs, the only benefit I see of binary packages over precompiled source packages is a small savings in size which doesn't seem like it would amount to much of the install time. Can someone explain the claims for binary packages? On Mon, May 20, 2013 at 8:57 PM, Jon Zeppieri zeppi...@gmail.com wrote: On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote: [snip] Example: Imagine I'm in the middle of writing a Racket program and am wondering about characteristics of some kind of I/O port in Racket. With transparent source accessibility, I can just click on an identifier in my program in DrRacket to start browsing the implementation. Maybe I see a possible improvement, or seeing the source pre-empts yet another email list question that otherwise only Matthew could answer, or I feel empowered to go add a new feature. If the source is not as accessible, then I'm more likely to be a mere naive user of the tools, rather than to understand the tools and help improve them. +inf.0 Though the easiest way to make the source available is just to keep it in the distribution. I'll be sad to see it go. -Jon _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev