I've added "submodules" to a version of Racket labeled v5.2.900.1 that's here:
https://github.com/mflatt/submodules After we've sorted out any controversial parts of the design and after the documentation is complete, then I'll be ready to merge to the main Racket repo. Why Submodules? --------------- Using submodules, you can abstract (via macros) over a set of modules that have distinct dynamic extents and/or bytecode load times. You can also get a private communication channel (via binding) from a module to its submodules. Some uses: * When you run a module via `racket', if it has a `main' submodule, then the `main' module is instantiated --- but not the `main' submodules of any other modules used by the starting module. This protocol is implemented for `racket', but not yet for DrRacket. * Languages with separate read-time, configure-time, and run-time code can be defined in a single module, with the configure-time and read-time code in submodules. * A testing macro could collect test cases and put them into a separate `test' submodule', so that testing code is not run or even loaded when the module is used normally. * An improved `scribble/srcdoc' can expose documentation through a submodule instead of through re-expansion hacks. * If you want to export certain of a module's bindings only to when explicitly requested (i.e., not when the module is `require'd normally), you can export the bindings from a submodule, instead. When I first started talking about these problems last summer, I called the solution sketch "facets" or "modulets", but the design has evolved into "submodules". Nesting `module' ---------------- Given the term "submodule", the first thing that you're likely to try will work as expected: #lang racket/base (module zoo racket/base (provide tiger) (define tiger "Tony")) (require 'zoo) tiger Within `module', a module path of the form `(quote id)' refers to the submodule `id', if any. If there's no such submodule, then `(quote id)' refers to an interactively declared module, as before. Submodules can be nested. To access a submodule from outside the enclosing module, use the `submod' module path form: #lang racket/base (module zoo racket/base (module monkey-house racket/base (provide monkey) (define monkey "Curious George")) (displayln "Ticket, please")) (require (submod 'zoo monkey-house)) monkey The 'zoo module path above is really a shorthand for `(submod "." zoo)', where "." means the enclosing module and `zoo' is its submodule. You could write `(submod "." zoo monkey-house)' in place of `(submod 'zoo monkey-house)'. Note that `zoo' and `monkey-house' are not bound as identifiers in the module above --- just like `module' doesn't add any top-level bindings. The namespace of modules remains separate from the namespace of variables and syntax. Along those lines, submodules are not explicitly exported, because they are implicitly public. When you run the above program, "Ticket, please" is *not* displayed. Unless a module `require's a submodule, instantiating the module does not instantiate the submodule. Similarly, instantiating a submodule does not imply instantiating its enclosing module. Furthermore, if you compile the above example to bytecode and run it, the bytecode for `zoo' is not loaded. Only the bytecode for the top-level module and `monkey-house' is loaded. Nesting `module*' ----------------- Submodules declared with `module' are declared locally while expanding a module body, which means that the submodules can be `require'd afterward by the enclosing module. This ordering means, however, that the submodule cannot `require' the enclosing module. The submodule also sees no bindings of the enclosing module; it starts with an empty lexical context. The `module*' form is like `module', but it can be used only for submodules, and it defers the submodule's expansion until after the enclosing module is otherwise expanded. As a result, a submodule using `module*' can `require' its enclosing module, while the enclosing module cannot require the submodule. A ".." in a `submod' form goes up the submodule hierarchy, so that `(submod "." "..")' is a reference to the enclosing module: #lang racket/base (module aquarium racket/base (provide fish) (define fish '(1 2)) (module* book racket/base (require (submod "." "..")) (append fish '(red blue)))) (require (submod 'aquarium book)) Instead of `require'ing its enclosing module, a `module*' form can use `#f' as its language, in which case its lexical context starts with all of the bindings of the enclosing module (implicitly imported) instead of with an empty lexical context. As a result, the submodule can access bindings of the enclosing module that are not exported: #lang racket/base (module aquarium racket/base (define fish '(1 2)) (module* book #f (append fish '(red blue)))) (require (submod 'aquarium book)) A common use of `module*' is likely to be with `main', since `racket' will load a `main' submodule (after `require'ing its enclosing module) for a module named on its command line. For example, if you run this program via `racket': #lang racket/base (provide fish) (define fish '(1 2)) (module* main #f (unless (apply < fish) (error "fish are not sorted"))) then you get a "fish are not sorted" error, but if you `require' the file into another program, you get a `fish' binding with no error. The new `#lang' --------------- The `#lang' reader form was previously defined as a shorthand for `#reader' where the name after the `#lang' is mangled by adding "/lang/reader". With submodules, `#lang' first tries using the name as-is and checking for a `reader' submodule; if it is found, then the submodule is used instead of mangling the name with "/lang/reader", otherwise it falls back to the old behavior. So, if you want to define an `ocean' language that is `racket/base' plus `fish', it's enough to install the following module as "main.rkt" in an "ocean" collection: #lang racket/base (provide (all-from-out racket/base) fish) (define fish '(1 2 3)) (module reader syntax/module-reader #:language 'ocean) Backwards Incompatibility ------------------------- The biggest incompatibility is that `resolved-module-path-name' can return a list when the module path refers to a submodule, in addition to the old path and symbol results. Most code that calls `resolved-module-path-name' will have to be updated. The `submod' form is a new primitive module-path form, so module name resolvers also must be updated. Finally, a load/use-compiled handler must accept a list as the expected-module name, which usually indicates that a submodule is being loaded; the list can start with `#f' to indicate that the module should only be loaded if it can be loaded independently from bytecode (i.e., without triggering the declaration of any other submodule, which means not loading from source). Furthermore, when a submodule is requested, no error should be raised if the enclosing module is unavailable, which allows speculative checking for submodule declarations. The bytecode format has changed, and the `mod' structure type from `compiler/zo-parse' has two new fields: one for "pre" submodules (i.e., those declared with `module') and one for "post" submodules (i.e., those declared with `module*'). Any code that uses `compiler/zo-parse' will have to change. If you compile a `module' form and it has submodules, then when you write the bytecode, all of the modules are written together. If the `module' is not inside a larger top-level sequence, then the printed form starts with a table that can be used to find any individual submodule, which is how independent loading of submodules works. If you just `read' the table in, though, it returns a compiled-module value that contains submodules, and `eval'ing the compiled module declares all the submodules, too. This protocol makes lots of `compile' and `eval' code work without modification. The `get-module-code' function from `syntax/modcode', meanwhile, gives you more control, along with functions like module-compiled-submodules' to get or adjust the submodule list in a compiled-module value. Design Issues ------------- The `submod' syntax --- especially "." and ".." --- is arbitrary. The `submod' name isn't great, but I like it the best among the options that I tried. I'm not sure whether the association of "." and ".." to filesystem paths is helpfully mnemonic or unhelpfully confusing. The handling of `quote' paths within a module is also arbitrary, but it's intended to smooth the connection between the top level and a module body. Overloading `module' for submodules is questionable; again, though, I like how it roughly matches interactive evaluation. For the post-submodule form, then, `module*' seems like the obvious choice. As things stand, the ugly pattern `(module* main #f ...)' would be common. Probably we should have a macro that expands to `(module* main #f ...)'. Should the macro be called `main'? I haven't tried to build a test-collecting macro or a `scribble/srcdoc' replacement. I think they will work with this submodule design, but I can't be sure until we try it. _________________________ Racket Developers list: http://lists.racket-lang.org/dev