Here's a big-picture update of where we are in the new package system and the conversion of the Racket distribution to use packages.
This message covers - how I see things working after the package system and reorganization is done, and a report on what pieces are still missing to reach that vision; - a look at how we got to our current design/reorganization choices and whether we're choosing the right place; and - speculation on why the package changes have been so difficult to implement. All of that makes it a long message (sorry!), but I hope this message is useful to bring us more in sync. A Package-Based Racket ---------------------- Let's take a look at how you'll do various things in the new package-based Racket world. (There's no new information here, and parts marked with "[guess]" are especially speculative. Still, some details may be clearer than in earlier accounts, now that much of it is implemented, and I think a comprehensive review may be useful.) ** Downloading release installers from PLT The "www.racket-lang.org" site's big blue button will provide the same installers that it does now, at least by default. That is, the content provided by the installer --- DrRacket, teaching languages, etc. --- will be pretty much the same as now. The blue button might also provide the option of "Minimal Racket" installers, which gives you something that's a small as we can make it and still provides command-line `raco pkg'. ** Downloading installers from other distributors There are all sorts of reasons that the "main distribution" from PLT might not fit the needs of some group. Maybe the release cycle is too long or at the wrong time. Maybe it includes much too much, much too little, or almost the right amount but missing a crucial package. Maybe the group wants something almost minimal, but still with a graphical package manager. Maybe some group uses a platform for which PLT does not provide an installer. For many of those groups, using a "Minimal Racket" installer plus selective package installations will do the trick. For others, creating a special set of installers might be worthwhile, but there are too many reasons and too many permutations for PLT to provide installers that cover all of them. Fortunately, anyone can build a set of installers and put them on a web page, and we make it as easy as possible to build a set of installers that start with a given set of packages. PLT could host a web page or wiki that points to other distributors. PLT might even be able to provide an automated service that generates a set of installers for a basic set of platforms. ** Compiling a release from source In addition to installers, a download site can provide a source-code option (not specific to any platform, unlike the current source packages), which would mainly be used for building Racket on additional platforms. This option is mostly a snapshot of the source-code repository for the core, but it includes a pre-built "collects" tree (see "technical detail", below) and a default configuration that points back to the distributor's site for pre-built packages. ** Adding or upgrading supported packages In much the same way that you can easily install a set of supported packages on your current OS, you'll be able to easily install a set of packages that are supported by your distributor. Those packages are pre-built, so they install quickly, along with any included documentation. Depending on the distributor and installer, packages might be downloaded and installed in "binary" form, which means that tests and source code (for libraries and documentation) are omitted from the package. PLT seems unlikely to provide such installers in the near future. The default package scope configured by a distribution tends to be "user", which means that packages are installed in a user-specific location. Package updates can be made available by distributors for whatever reason and on whatever timetable see they fit. If your distribution is from PLT, then the supported packages are called "ring-0" packages. Ring-0 packages include contributions from third parties (i.e., not just packages implemented by PLT) that are vetted and regularly tested by PLT. [Guess:] The "Racket" and "Minimal Racket" distributions might point to different pre-built package catalogs. Possibly, the "Racket" catalog never updates packages that were included in the installer (on the grounds that the user may not have write permission to the install), while the "Minimal Racket" catalog includes more frequent updates for bug fixes (on the grounds that the user can update any installed package). A distributor doesn't necessarily have to provide its own package catalog. It can instead supply an installer that works with packages as served by some other distributor's catalog, such as PLT's catalog. (See "technical detail" below.) A user can also redirect `raco pkg' to a different catalog server, instead of using the configuration that was supplied by the installer. Binary, pre-built, and source variants of a package can be "updated" to each other in any direction. ** Adding or upgrading other packages An installer-provided configuration will normally point to a catalog of packages that are not specifically supported by the distributor but are still readily available --- probably mostly in source form and directly pulled from a git repository. In particular, "pkg.racket-lang.org" provides packages in source form. ** Reading documentation A distribution site provides online documentation (including all supported packages) alongside installers and packages. Many installers and packages include documentation to be installed on a user's machine, but there are some packages that provide libraries without documentation. For example, "gui-lib" provides GUI libraries without local documentation, while "gui" combines "gui-lib" local documentation and the libraries. Sometimes, documentation that is installed locally will still refer to documentation that is not downloaded. Such links are directed back to the distributor's site. That situation won't happen often for pre-built packages, because links that go to other packages will tend to go to packages that are dependencies. It will happen more for binary packages, because the dependency can be build-time only. ** Creating new packages A minimal package is a directory. So, let's suppose that you have some modules in a directory that you want to turn into a package. Suppose that your directory is called "potato", and it has module a file "eat.rkt". Turn your directory into a locally installed package with raco pkg install --link potato Then, you can use "eat.rkt" with (require potato/eat) To give your package to someone else, you could zip up the "potato" directory as "potato.zip", and the other person would install with raco pkg install potato.zip Note that you can use any zip archiving tool, or you can use rack pkg create --form-install potato to create the ".zip" file, which has the advantage that directories like "compiled" and ".git" are omitted. Even better, maybe your directory is already on GitHub at "http://github.com/idaho/potato". Then, others can install your package with raco pkg install github://github.com/idaho/potato/master If you push changes to your GitHub repository, others can get them with raco pkg update potato If you're ready for the world to use your package, then go to "pkg.racket-lang.org" and point the package name "potato" at your GitHub repository. Then, not only will others know about your package, they'll be able to install it with raco pkg install potato Finally, if you'd like PLT to include your package as a pre-built package with each snapshot and release, then go back to "pkg-racket-lang.org" and request ring-0 status for the package. Ring-0 status may require a few bureaucratic improvements to your package, such as including an "info.rkt" file if you don't have one already, because those details are needed to keep your package in working order. ** Using the cutting edge PLT provides one or more snapshot sites that work the same as the release site, except that each snapshot's catalog expires after a few days. When that catalog goes away, you can continue to use the snapshot, but you'll have to get packages and updates via source. ** Using the bleeding edge A user who wants to work with the minute-by-minute latest can start by cloning the core Racket git repository, `configure', `make', and `make install' to get a Minimum Racket build. Then, start installing packages with `raco pkg'. The default package catalog in built-from-source Racket is "pkg.racket-lang.org", which means that you get all packages in source form from various git repositories, including for PLT-maintained packages. The default package scope is "installation". If you run `raco pkg update -a', then you likely get updates and trigger many compiles. Eventually, an update will fail, because your core Racket version is too old, and you'll need to `git pull', `configure', `make', and `make install' --- if you haven't been doing that, anyway. Since packages were added with installation-wide scope, `make install' rebuilds your previously installed packages, too. ** Using the bleeding edge as a PLT developer As a convenience to PLT developers, who tend to work on a particular set of packages, there is an alternate way of working on the bleeding edge (which anyone can use, if they prefer). [Guess #1:] Instead of cloning the core Racket repo, clone a "main distribution" repo that has the core Racket repo as a submodule, plus git submodules for each of the packages that are dependencies of "main-distribution". In other words, you get something that looks like the current Racket repo, but that uses git submodules. [Guess #2:] Instead of cloning the core Racket repo from GitHub, you clone from the "main distribution" repository, just like now. In addition to being mirrored to GitHub directly, individual parts of the "main distribution" repo are mirrored as GitHub repositories, and the mirrors are the ones that "pkg.racket-lang.org" references. GitHub repositories that correspond to packages (submodules in guess #1, mirrored subtrees in guess #2) are registered with "pkg.racket-lang.org", which is how users on the bleeding-edge might normally get the packages. ** Becoming a distributor If you want to create installers like PLT's, then it's simplest to clone the git repo like a PLT developer, and then use `make installers'. Alternatively, you can use `make installers-from-catalog' to create a set of installers based on packages pulled from a specified catalog. Either way, if you want to piggy-back on some other installer's set of pre-built packages, then configuration options and/or makefile targets to do that. (This is more sketchy; see below.) ** Taking your own snapshot of Racket and packages: Sometimes, you don't need to build installers, but you'd still like a snapshot of the current Racket core and package. You might want to edit the snapshot to upgrade some packages while keeping others the same. The `raco pkg catalog-copy' command is one of many tools to manipulate catalog servers. For packages that are mapped to GitHub repositories, merely copying a catalog doesn't archive the code, but it archives a particular commit id. It's always possible to grab a copy of a package repository and reference the copy from a catalog. A Technical Detail ------------------ Starting from scratch twice with the same Racket sources does not lead to compatible pre-built packages, unfortunarely, because bytecode files are generated deterministically. Maybe we'll be able to fix that, one day. Meanwhile, pre-built packages depend on a particular build of the libraries in "collects", as well as a particular build of any dependencies. So, if a distributor wants to enable other distributors that use the same catalog of pre-built packages, the distributor must serve a "collects" tarball, too. Providing the "collects" will be built into the snapshot support. >From Here to There ------------------ The snapshot site http://www.cs.utah.edu/plt/snapshots/ demonstrates how a lot is working. Here are the remaining implementation issues: * Generated distribution sites do not yet include a source code option or "collects.tgz" for piggy-backing distributors, and the makefile or configuration file lacks support for piggy-backing. These seem straightforward to add. * The PLT-maintained packages are not yet reflected on "pkg.racket-lang.org". Because all of those packages are currently in one big git repository, it's not clear how to register the packages. Guesses #1 and #2 in "Developing Racket like PLT developers" above are two possible routes. Another is that we set up a process to pull from git and bundle package sources into individual zip archive that are registered on "pkg.racket-lang.org". * The `make installers' support needs to be less tied to "main-distribution". You can configure the set of packages that are built and included in installers by `make installers', but that set currently must be be a subset of the packages in the "pkgs" directory of the Racket repository. It's easy in principle to pull the packages from a catalog server, but there will be some issues to sort out in the bootstrapping process and in ensuring a consistent snapshot. * No support yet for generated distributions sites with binary packages. Probably not too difficult. I forget what went wrong last time I tried this, but a lot has been fixed since then. In any case, the idea of binary packages does not seem to have gained much traction. * Package-dependency checking for tests. Maybe it's just a matter of compiling tests sorting them into suitable packages, like everything else, which is a direction that we've already started. * The "main-distribution" package needs to be cleaned up. The "main-distribution" package currently inclues tests, and it includes packages like "honu" that are not in the current release. This clean-up task is related to sorting out tests. * Different builds modes are not yet configured with different default package scopes. Should be easy. I also have a long-ish list of minor repairs and usability improvements to tackle. >From Back There to Here ----------------------- I think the big-picture plans are probably uncontroversial. When it comes to the details of exactly how things work and how things are named, I'm hearing less confidence or less agreement. Some of us are steeped in the issues and have different opinions. Others seem overwhelmed by the details, unsure of how it will all work out, and disconcerted by conflicting messages from others who seem to understand the issues. For people who are in that last group or close to it, it may seem overall that we're moving into a new package system too quickly. The decision to split Racket into packages has stressed our development process, because now we're tackling two hard problems instead of one: developing a package system and using it on a big pile of code. I think a good case could be made that the package system is too new to trust with a big shift. At the same time, my sense is that waiting until the package system is good enough isn't how software works; a piece of software becomes good enough for its job only when you make it do its job. >From what I hear, the issues that make people uncomfortable fit into three categories: * Package-system design * Repository organization * Concerns that a more distributed ecosystem means a less unified one Let's take them one at a time. ** Package-system design We all appreciate the work that Jay did to design the package system. I hear lingering concern about the design, including its limited support for versioning (just dependency checks), the fact that the package system is outside the module system (no built-in auto-download of packages, although a tool like DrRacket can suggest package installs in response to missing-library exceptions), its stance on conflicts (simply disallowed), and its flat namespace (which could make conflicts more frequent). On some of the points, I think reasonable people will disagree. We've had a years-long discussion, and we've been paying attention to precedents. We've explored some nearby alternatives to the current design (I'm thinking of single-collection versus multi-collection packages). I think we've gotten as close to consensus as possible. ** Repository organization As we try to split the Racket repository into packages, the questions concern how finely to split the repository and how to eventually allocate packages to source-code repositories. I think the initial split of the Racket repository went more smoothly than anyone expected. It was fairly easy, for example, to extract a relatively small core to run `raco pkg', or to draw a line between DrRacket and the teaching languages. I chalk that up to general competence among the Racket implementors: big systems must be developed in layers, whether the layers are declared or not. In fact, it has worked out so well that the splitting of Racket into packages has taken a more aggressive form than I expected. At this point, we've split the Racket repository into 137(!) packages, and that number is still growing. Two of us tried to make a coarser split, and it didn't feel right. Others have since started shuffling packages and continue to split things further. We seem to really like declaring dependencies and reducing unrequested functionality. Given that packages are going to be split finely, the question of allocating packages to repositories is less straightforward. We've concluded that "scribble-lib" and "scribble-doc" are good to have separate as separate packages, but we don't want Scribble's implementation and its documentation to end up in a separate source-code repositories. At the same time, putting everything in one big repository is intractable, at least at the point where we want packages downloaded directly from a repository. (A package can be a subdirectory of a repository, but the package manager has to download a tarball of the entire tree to extract the subdirectory.) So, under "pkgs", we have an extra layer in the directory hierarchy to reflect an intended organization into repositories. Using a layer of directories is consistent with git submodules, if we choose to go that way. The fact that many of us have tried and arrived at the same conclusion on granularity gives me confidence that it's a reasonable conclusion, but the current Racket repository organization really does feel complex. For example, the core of `raco setup' is racket/lib/collects/setup/setup-unit.rkt while the Scribble part of `raco setup' is in pkgs/racket-pkgs/racket-index/setup/scribble.rkt Those paths reflect that `raco setup' is mostly core functionality, but you don't get documentation setup until you install the "racket-index" package, which is currently grouped with other almost-core packages. This example also illustrates how the current organization relies on collection splicing in a big way. In the long run, not many collections are going to be spliced so much as, say, "racket" and "data", but splicing two or three times to separate modules, documentation, and tests may turn out to be common. And then there's pkgs/drracket-pkgs/drracket/drracket/drracket.rkt ^ ^ ^ ^ repo package collection module Every layer before a "/" has multiple descendents, so they layers are not trivially collapsed. If you just look at the path, it seems crazy. But if you're expecting <repo>/<package>/<collection>/<module>, then hopefully it seems reasonable. In short, the current layout is driven by three factors: a bias toward fine-grained packages, a sense that it's good to reflect layers and dependencies via separate filesystem directories, and some constraints on how directories relate to git repositories. Unless we change those driving factors, I don't see us arriving at a simpler organization. ** Distributed versus unified ecosystem While less prominent than the other categories, I'm also hearing some concern that splitting up the Racket repository and reorganizing various pieces of infrastructure will lead to a less unified system --- or even a less unified community. Moving our products and infrastructure into a more distributed form is one of my main goals, but I don't think that "distributed" has to mean "fragmented". It seems to me that the more distributed we are able to make our world (the Internet, git, etc.), the more closely we are able to work together. The math behind that effect eludes me, but I believe in it, anyway. At the same time, the sudden emphasis on reorganizing the Racket repository could also give the impression that the new package system is primarily about distributing Racket, and not about "third-party" libraries and packages. I think we're trying to make our as much code as possible treated as "third-party", and thus ensure that all parties are well supported. Why Aren't We There, Yet? ------------------------- We're hardly the first to design a package system or apply it to a big system, and I can't shake the sense most of the time that we're just reinventing the wheel. Along those lines, implementing the mechanics of the package system has been suspiciously difficult. I hope that part of the reason is our commitment to documentation --- that it exists, that it builds reliably, that it's richly formatted, and that it is pervasively cross-referenced and hyperlinked. I don't think that any package system delivers documentation that's anything like ours. Could it also be an unusual commitment to relative paths, especially when distribution pre-built items? A lot of problems go away if you know that the library is going to be in "/usr/local/lib". Surely part of it is trying to make `raco setup' fast for installing packages. It's complex and fragile to performing an incremental computation based on changes inferred from filesystem state. Bootstrapping, at least, is known to be tricky. The Racket compiler isn't written in Racket, yet, but the installer-creator installs Racket packages to create a local installation that is used to set up packages on a remote installation that runs a Racket script to build an installer. It took many days to make that work and make it configurable. On the plus side, `raco setup' can usefully check package dependencies and sort them into "build-time" and "run-time" dependencies, even for documentation links, and that checking was relatively easy to implement. Since module collection references can be synthesized at run time, there's no way to completely check dependencies statically, but I think we may end up with something that's more reliable and complete than checking in other package systems. If so, maybe that helps explain why it was hard. _________________________ Racket Developers list: http://lists.racket-lang.org/dev