Thanks! (For one, I found the "From Back There to Here" section particularly helpful.)
On Sat, Jul 13, 2013 at 1:56 PM, Matthew Flatt <mfl...@cs.utah.edu> wrote: > Here's a big-picture update of where we are in the new package system > and the conversion of the Racket distribution to use packages. > > This message covers > > - how I see things working after the package system and > reorganization is done, and a report on what pieces are still > missing to reach that vision; > > - a look at how we got to our current design/reorganization choices > and whether we're choosing the right place; and > > - speculation on why the package changes have been so difficult to > implement. > > All of that makes it a long message (sorry!), but I hope this message > is useful to bring us more in sync. > > > A Package-Based Racket > ---------------------- > > Let's take a look at how you'll do various things in the new > package-based Racket world. > > (There's no new information here, and parts marked with "[guess]" are > especially speculative. Still, some details may be clearer than in > earlier accounts, now that much of it is implemented, and I think a > comprehensive review may be useful.) > > ** Downloading release installers from PLT > > The "www.racket-lang.org" site's big blue button will provide the same > installers that it does now, at least by default. That is, the content > provided by the installer --- DrRacket, teaching languages, etc. --- > will be pretty much the same as now. > > The blue button might also provide the option of "Minimal Racket" > installers, which gives you something that's a small as we can make it > and still provides command-line `raco pkg'. > > ** Downloading installers from other distributors > > There are all sorts of reasons that the "main distribution" from PLT > might not fit the needs of some group. Maybe the release cycle is too > long or at the wrong time. Maybe it includes much too much, much too > little, or almost the right amount but missing a crucial > package. Maybe the group wants something almost minimal, but still > with a graphical package manager. Maybe some group uses a platform for > which PLT does not provide an installer. > > For many of those groups, using a "Minimal Racket" installer plus > selective package installations will do the trick. For others, > creating a special set of installers might be worthwhile, but there > are too many reasons and too many permutations for PLT to provide > installers that cover all of them. > > Fortunately, anyone can build a set of installers and put them on a > web page, and we make it as easy as possible to build a set of > installers that start with a given set of packages. PLT could host a > web page or wiki that points to other distributors. PLT might even be > able to provide an automated service that generates a set of > installers for a basic set of platforms. > > ** Compiling a release from source > > In addition to installers, a download site can provide a source-code > option (not specific to any platform, unlike the current source > packages), which would mainly be used for building Racket on > additional platforms. > > This option is mostly a snapshot of the source-code repository for the > core, but it includes a pre-built "collects" tree (see "technical > detail", below) and a default configuration that points back to the > distributor's site for pre-built packages. > > ** Adding or upgrading supported packages > > In much the same way that you can easily install a set of supported > packages on your current OS, you'll be able to easily install a set of > packages that are supported by your distributor. Those packages are > pre-built, so they install quickly, along with any included > documentation. > > Depending on the distributor and installer, packages might be > downloaded and installed in "binary" form, which means that tests and > source code (for libraries and documentation) are omitted from the > package. PLT seems unlikely to provide such installers in the near > future. > > The default package scope configured by a distribution tends to be > "user", which means that packages are installed in a user-specific > location. > > Package updates can be made available by distributors for whatever > reason and on whatever timetable see they fit. > > If your distribution is from PLT, then the supported packages are > called "ring-0" packages. Ring-0 packages include contributions from > third parties (i.e., not just packages implemented by PLT) that are > vetted and regularly tested by PLT. > > [Guess:] The "Racket" and "Minimal Racket" distributions might point > to different pre-built package catalogs. Possibly, the "Racket" > catalog never updates packages that were included in the installer (on > the grounds that the user may not have write permission to the > install), while the "Minimal Racket" catalog includes more frequent > updates for bug fixes (on the grounds that the user can update any > installed package). > > A distributor doesn't necessarily have to provide its own package > catalog. It can instead supply an installer that works with packages > as served by some other distributor's catalog, such as PLT's > catalog. (See "technical detail" below.) > > A user can also redirect `raco pkg' to a different catalog server, > instead of using the configuration that was supplied by the > installer. Binary, pre-built, and source variants of a package can be > "updated" to each other in any direction. > > ** Adding or upgrading other packages > > An installer-provided configuration will normally point to a catalog > of packages that are not specifically supported by the distributor but > are still readily available --- probably mostly in source form and > directly pulled from a git repository. In particular, > "pkg.racket-lang.org" provides packages in source form. > > ** Reading documentation > > A distribution site provides online documentation (including all > supported packages) alongside installers and packages. > > Many installers and packages include documentation to be installed on > a user's machine, but there are some packages that provide libraries > without documentation. For example, "gui-lib" provides GUI libraries > without local documentation, while "gui" combines "gui-lib" local > documentation and the libraries. > > Sometimes, documentation that is installed locally will still refer to > documentation that is not downloaded. Such links are directed back to > the distributor's site. That situation won't happen often for > pre-built packages, because links that go to other packages will tend > to go to packages that are dependencies. It will happen more for > binary packages, because the dependency can be build-time only. > > ** Creating new packages > > A minimal package is a directory. So, let's suppose that you have some > modules in a directory that you want to turn into a package. Suppose > that your directory is called "potato", and it has module a file > "eat.rkt". > > Turn your directory into a locally installed package with > > raco pkg install --link potato > > Then, you can use "eat.rkt" with > > (require potato/eat) > > To give your package to someone else, you could zip up the "potato" > directory as "potato.zip", and the other person would install with > > raco pkg install potato.zip > > Note that you can use any zip archiving tool, or you can use > > rack pkg create --form-install potato > > to create the ".zip" file, which has the advantage that directories > like "compiled" and ".git" are omitted. > > Even better, maybe your directory is already on GitHub at > "http://github.com/idaho/potato". Then, others can install your > package with > > raco pkg install github://github.com/idaho/potato/master > > If you push changes to your GitHub repository, others can get them > with > > raco pkg update potato > > If you're ready for the world to use your package, then go to > "pkg.racket-lang.org" and point the package name "potato" at your > GitHub repository. Then, not only will others know about your package, > they'll be able to install it with > > raco pkg install potato > > Finally, if you'd like PLT to include your package as a pre-built > package with each snapshot and release, then go back to > "pkg-racket-lang.org" and request ring-0 status for the package. > Ring-0 status may require a few bureaucratic improvements to your > package, such as including an "info.rkt" file if you don't have one > already, because those details are needed to keep your package in > working order. > > ** Using the cutting edge > > PLT provides one or more snapshot sites that work the same as the > release site, except that each snapshot's catalog expires after a few > days. When that catalog goes away, you can continue to use the > snapshot, but you'll have to get packages and updates via source. > > ** Using the bleeding edge > > A user who wants to work with the minute-by-minute latest can start by > cloning the core Racket git repository, `configure', `make', and `make > install' to get a Minimum Racket build. Then, start installing > packages with `raco pkg'. > > The default package catalog in built-from-source Racket is > "pkg.racket-lang.org", which means that you get all packages in source > form from various git repositories, including for PLT-maintained > packages. The default package scope is "installation". > > If you run `raco pkg update -a', then you likely get updates and > trigger many compiles. Eventually, an update will fail, because your > core Racket version is too old, and you'll need to `git pull', > `configure', `make', and `make install' --- if you haven't been doing > that, anyway. Since packages were added with installation-wide scope, > `make install' rebuilds your previously installed packages, too. > > ** Using the bleeding edge as a PLT developer > > As a convenience to PLT developers, who tend to work on a particular > set of packages, there is an alternate way of working on the bleeding > edge (which anyone can use, if they prefer). > > [Guess #1:] Instead of cloning the core Racket repo, clone a "main > distribution" repo that has the core Racket repo as a submodule, plus > git submodules for each of the packages that are dependencies of > "main-distribution". In other words, you get something that looks like > the current Racket repo, but that uses git submodules. > > [Guess #2:] Instead of cloning the core Racket repo from GitHub, you > clone from the "main distribution" repository, just like now. In > addition to being mirrored to GitHub directly, individual parts of the > "main distribution" repo are mirrored as GitHub repositories, and > the mirrors are the ones that "pkg.racket-lang.org" references. > > GitHub repositories that correspond to packages (submodules in guess > #1, mirrored subtrees in guess #2) are registered with > "pkg.racket-lang.org", which is how users on the bleeding-edge might > normally get the packages. > > ** Becoming a distributor > > If you want to create installers like PLT's, then it's simplest to > clone the git repo like a PLT developer, and then use `make > installers'. > > Alternatively, you can use `make installers-from-catalog' to create a > set of installers based on packages pulled from a specified catalog. > > Either way, if you want to piggy-back on some other installer's set of > pre-built packages, then configuration options and/or makefile targets > to do that. (This is more sketchy; see below.) > > ** Taking your own snapshot of Racket and packages: > > Sometimes, you don't need to build installers, but you'd still like a > snapshot of the current Racket core and package. You might want to > edit the snapshot to upgrade some packages while keeping others the > same. > > The `raco pkg catalog-copy' command is one of many tools to manipulate > catalog servers. For packages that are mapped to GitHub repositories, > merely copying a catalog doesn't archive the code, but it archives a > particular commit id. It's always possible to grab a copy of a package > repository and reference the copy from a catalog. > > > A Technical Detail > ------------------ > > Starting from scratch twice with the same Racket sources does not lead > to compatible pre-built packages, unfortunarely, because bytecode > files are generated deterministically. Maybe we'll be able to fix > that, one day. > > Meanwhile, pre-built packages depend on a particular build of the > libraries in "collects", as well as a particular build of any > dependencies. So, if a distributor wants to enable other distributors > that use the same catalog of pre-built packages, the distributor must > serve a "collects" tarball, too. Providing the "collects" will be > built into the snapshot support. > > > From Here to There > ------------------ > > The snapshot site > > http://www.cs.utah.edu/plt/snapshots/ > > demonstrates how a lot is working. > > Here are the remaining implementation issues: > > * Generated distribution sites do not yet include a source code > option or "collects.tgz" for piggy-backing distributors, and the > makefile or configuration file lacks support for piggy-backing. > > These seem straightforward to add. > > * The PLT-maintained packages are not yet reflected on > "pkg.racket-lang.org". > > Because all of those packages are currently in one big git > repository, it's not clear how to register the packages. Guesses #1 > and #2 in "Developing Racket like PLT developers" above are two > possible routes. Another is that we set up a process to pull from > git and bundle package sources into individual zip archive that are > registered on "pkg.racket-lang.org". > > * The `make installers' support needs to be less tied to > "main-distribution". > > You can configure the set of packages that are built and included > in installers by `make installers', but that set currently must be > be a subset of the packages in the "pkgs" directory of the Racket > repository. It's easy in principle to pull the packages from a > catalog server, but there will be some issues to sort out in the > bootstrapping process and in ensuring a consistent snapshot. > > * No support yet for generated distributions sites with binary > packages. > > Probably not too difficult. I forget what went wrong last time I > tried this, but a lot has been fixed since then. In any case, the > idea of binary packages does not seem to have gained much traction. > > * Package-dependency checking for tests. > > Maybe it's just a matter of compiling tests sorting them into > suitable packages, like everything else, which is a direction that > we've already started. > > * The "main-distribution" package needs to be cleaned up. > > The "main-distribution" package currently inclues tests, and it > includes packages like "honu" that are not in the current release. > This clean-up task is related to sorting out tests. > > * Different builds modes are not yet configured with different > default package scopes. > > Should be easy. > > I also have a long-ish list of minor repairs and usability > improvements to tackle. > > > From Back There to Here > ----------------------- > > I think the big-picture plans are probably uncontroversial. > > When it comes to the details of exactly how things work and how things > are named, I'm hearing less confidence or less agreement. Some of us > are steeped in the issues and have different opinions. Others seem > overwhelmed by the details, unsure of how it will all work out, and > disconcerted by conflicting messages from others who seem to > understand the issues. For people who are in that last group or close > to it, it may seem overall that we're moving into a new package system > too quickly. > > The decision to split Racket into packages has stressed our > development process, because now we're tackling two hard problems > instead of one: developing a package system and using it on a big pile > of code. I think a good case could be made that the package system is > too new to trust with a big shift. At the same time, my sense is that > waiting until the package system is good enough isn't how software > works; a piece of software becomes good enough for its job only when > you make it do its job. > > From what I hear, the issues that make people uncomfortable fit into > three categories: > > * Package-system design > > * Repository organization > > * Concerns that a more distributed ecosystem means a less unified one > > Let's take them one at a time. > > ** Package-system design > > We all appreciate the work that Jay did to design the package > system. I hear lingering concern about the design, including its > limited support for versioning (just dependency checks), the fact that > the package system is outside the module system (no built-in > auto-download of packages, although a tool like DrRacket can suggest > package installs in response to missing-library exceptions), its > stance on conflicts (simply disallowed), and its flat namespace (which > could make conflicts more frequent). > > On some of the points, I think reasonable people will disagree. We've > had a years-long discussion, and we've been paying attention to > precedents. We've explored some nearby alternatives to the current > design (I'm thinking of single-collection versus multi-collection > packages). I think we've gotten as close to consensus as possible. > > ** Repository organization > > As we try to split the Racket repository into packages, the questions > concern how finely to split the repository and how to eventually > allocate packages to source-code repositories. > > I think the initial split of the Racket repository went more smoothly > than anyone expected. It was fairly easy, for example, to extract a > relatively small core to run `raco pkg', or to draw a line between > DrRacket and the teaching languages. I chalk that up to general > competence among the Racket implementors: big systems must be > developed in layers, whether the layers are declared or not. > > In fact, it has worked out so well that the splitting of Racket into > packages has taken a more aggressive form than I expected. At this > point, we've split the Racket repository into 137(!) packages, and > that number is still growing. Two of us tried to make a coarser split, > and it didn't feel right. Others have since started shuffling packages > and continue to split things further. We seem to really like declaring > dependencies and reducing unrequested functionality. > > Given that packages are going to be split finely, the question of > allocating packages to repositories is less straightforward. We've > concluded that "scribble-lib" and "scribble-doc" are good to have > separate as separate packages, but we don't want Scribble's > implementation and its documentation to end up in a separate > source-code repositories. At the same time, putting everything in one > big repository is intractable, at least at the point where we want > packages downloaded directly from a repository. (A package can be a > subdirectory of a repository, but the package manager has to download > a tarball of the entire tree to extract the subdirectory.) So, under > "pkgs", we have an extra layer in the directory hierarchy to reflect > an intended organization into repositories. Using a layer of > directories is consistent with git submodules, if we choose to go that > way. > > The fact that many of us have tried and arrived at the same conclusion > on granularity gives me confidence that it's a reasonable conclusion, > but the current Racket repository organization really does feel > complex. For example, the core of `raco setup' is > > racket/lib/collects/setup/setup-unit.rkt > > while the Scribble part of `raco setup' is in > > pkgs/racket-pkgs/racket-index/setup/scribble.rkt > > Those paths reflect that `raco setup' is mostly core functionality, > but you don't get documentation setup until you install the > "racket-index" package, which is currently grouped with other > almost-core packages. > > This example also illustrates how the current organization relies on > collection splicing in a big way. In the long run, not many > collections are going to be spliced so much as, say, "racket" and > "data", but splicing two or three times to separate modules, > documentation, and tests may turn out to be common. > > And then there's > > pkgs/drracket-pkgs/drracket/drracket/drracket.rkt > ^ ^ ^ ^ > repo package collection module > > Every layer before a "/" has multiple descendents, so they layers are > not trivially collapsed. If you just look at the path, it seems > crazy. But if you're expecting <repo>/<package>/<collection>/<module>, > then hopefully it seems reasonable. > > In short, the current layout is driven by three factors: a bias toward > fine-grained packages, a sense that it's good to reflect layers and > dependencies via separate filesystem directories, and some constraints > on how directories relate to git repositories. Unless we change those > driving factors, I don't see us arriving at a simpler organization. > > ** Distributed versus unified ecosystem > > While less prominent than the other categories, I'm also hearing some > concern that splitting up the Racket repository and reorganizing > various pieces of infrastructure will lead to a less unified system > --- or even a less unified community. > > Moving our products and infrastructure into a more distributed form is > one of my main goals, but I don't think that "distributed" has to mean > "fragmented". It seems to me that the more distributed we are able to > make our world (the Internet, git, etc.), the more closely we are able > to work together. The math behind that effect eludes me, but I believe > in it, anyway. > > At the same time, the sudden emphasis on reorganizing the Racket > repository could also give the impression that the new package system > is primarily about distributing Racket, and not about "third-party" > libraries and packages. I think we're trying to make our as much code > as possible treated as "third-party", and thus ensure that all parties > are well supported. > > > Why Aren't We There, Yet? > ------------------------- > > We're hardly the first to design a package system or apply it to a big > system, and I can't shake the sense most of the time that we're just > reinventing the wheel. Along those lines, implementing the mechanics > of the package system has been suspiciously difficult. > > I hope that part of the reason is our commitment to documentation --- > that it exists, that it builds reliably, that it's richly formatted, > and that it is pervasively cross-referenced and hyperlinked. I don't > think that any package system delivers documentation that's anything > like ours. > > Could it also be an unusual commitment to relative paths, especially > when distribution pre-built items? A lot of problems go away if you > know that the library is going to be in "/usr/local/lib". > > Surely part of it is trying to make `raco setup' fast for installing > packages. It's complex and fragile to performing an incremental > computation based on changes inferred from filesystem state. > > Bootstrapping, at least, is known to be tricky. The Racket compiler > isn't written in Racket, yet, but the installer-creator installs > Racket packages to create a local installation that is used to set up > packages on a remote installation that runs a Racket script to build > an installer. It took many days to make that work and make it > configurable. > > On the plus side, `raco setup' can usefully check package dependencies > and sort them into "build-time" and "run-time" dependencies, even for > documentation links, and that checking was relatively easy to > implement. Since module collection references can be synthesized at > run time, there's no way to completely check dependencies statically, > but I think we may end up with something that's more reliable and > complete than checking in other package systems. If so, maybe that > helps explain why it was hard. > > _________________________ > Racket Developers list: > http://lists.racket-lang.org/dev >
_________________________ Racket Developers list: http://lists.racket-lang.org/dev