Hi David, this is a very good and interesting writeup. I will need some time to think about it. Your design might indeed be better; it is often hard to guess how these things will work out in practice.
But at first glance, it does trade manual management of internal/external symbols using `esc` for sometimes-manual management of contexts. Macro writers would have to deal with things that they think of as symbols, but that are not symbols. This is exactly what I was trying to avoid. I doubt we're at a global optimum, but this tradeoff was chosen carefully. On Sun, Feb 9, 2014 at 3:16 PM, David Moon <[email protected]> wrote: > Hygienic macros could be both better and simpler. Julia's hygiene only works > in simple cases, I think, and requires too much manual intervention. This > is because it is done in the wrong place, in the output of macro expansion, > so the macro expander has to guess the context of each symbol. Hygiene > ought to be done in the input to macro expansion, where the originating > context of every symbol is known. The esc function is a dead give-away that > something is wrong. It could be eliminated, which would make macros simpler > to define. Errors like the one in the manual's sample definition of assert > would no longer occur. > > I believe Julia's macros are taken directly from Scheme. Scheme never found > a fully satisfactory solution to the hygiene problem, because of the > inflexibility of S-expressions. I have been thinking about this issue for > quite a few years. > > Fortunately the problem is easily solved in Julia, where the representation > of expressions is more flexible. The key observation is that there should > be two kinds of symbols in the expansion of a macro, which I will call > "external" and "internal." Internal symbols come from the macro definition, > external symbols come from the macro call. Informally, external symbols are > visible to the caller of the macro and internal symbols are not, when used > as variable names. Other uses for symbols, such as terminals in the grammar > ("else"), literals (":foo"), and field names (".x") do not distinguish > internal from external. > > More formally, when a variable name is looked up in a scope, an internal > symbol only matches the same internal symbol. Thus binding an internal > symbol does not capture a reference to an external symbol, and vice versa. > When an internal symbol is not found in the current scope and its parents, > the external symbol with the same name is looked up in the scope where the > macro was defined. Thus free references in a macro expansion will have the > intended meaning, being looked up in the macro definition scope when the > reference came from the macro, but in the macro call scope when the > reference came from the caller. > > How can this be implemented? External symbols are the plain old symbols > that already exist. Internal symbols are a new AST type with two fields, > name and context. The name is the corresponding external symbol. The > context remembers the scope where the macro was defined and is a unique > object freshly created for each macro call. The quote construct converts > literal external symbols to internal symbols. Interpolated data and literal > internal symbols are left alone. The unary colon short form of quote is the > same. Local variable names can be internal symbols and variable binding > lookup is adjusted as described above: two internal symbols only match if > both the names and the contexts are the same. Uses of symbols other than as > variable names are modified to treat internal symbols the same as external. > > The esc function is no longer needed and should be removed. Any expression > that originated in the macro call is automatically escaped. Now the assert > example in the manual actually works. If a macro like the zerox example in > the manual needs to put an external symbol into the expansion, it just uses > a plain old symbol: > macro zerox() :($(symbol("x")) = 0) end. > > When a global variable is defined in a module, if the name is an internal > symbol it is converted to an external symbol so it is generally visible. > > Where does the quote construct get the context when it makes an internal > symbol? There are several ways it could be done. I prefer for quote to use > the value of the variable context; if there is no variable with that name in > scope it is an error. The macro statement implicitly defines the external > symbol context in the expander function. Users who want to break parts of > the expander function into separate functions must pass the context around > explicitly. Users who want to build expressions outside of a macro must > define context. It may be useful to have a user-callable Context > constructor that takes a module as its argument. > > Because internal symbols are not interned, there may be a speed decrease in > some cases. However this cost is only incurred at compile time. > > Macro-defining macros work, provided that when quote sees a literal internal > symbol it copies it unchanged into the expression being constructed. Thus > the expansion of a macro defined by a macro-defining macro may contain > internal symbols whose context comes from either macro. In the same way, > recursive or nested macros work, with each internal symbol remembering the > context where it originated. > > I have no strong opinion on whether macros are allowed to be defined in a > non-top-level context. If not, the macro definition scope remembered in an > internal symbol's context is just a module. > > A literal symbol is no longer the same thing as construction of an > expression consisting of only a literal variable name. The former produces > an external symbol, the latter produces an internal symbol. One approach > would be to disallow :x for a literal symbol and require symbol("x") to be > used, but the verbosity might be unpopular. Another approach would be to > treat :x as a special case; if you want to produce an internal symbol x you > must use quote x end or :(x). > > Incompatible changes here: > - Remove esc (or make it a no-op). > - quote no longer works if context is not defined. > - same for unary colon, unless the argument is just a symbol. > > Maybe the existing macros are good enough for your purposes, but I think the > hygiene could work better. What do you think?
