Hygienic macros could be both better and simpler. Julia's hygiene only
works in simple cases, I think, and requires too much manual intervention.
This is because it is done in the wrong place, in the output of macro
expansion, so the macro expander has to guess the context of each symbol.
Hygiene ought to be done in the input to macro expansion, where the
originating context of every symbol is known. The *esc* function is a dead
give-away that something is wrong. It could be eliminated, which would
make macros simpler to define. Errors like the one in the manual's sample
definition of *assert* would no longer occur.
I believe Julia's macros are taken directly from Scheme. Scheme never
found a fully satisfactory solution to the hygiene problem, because of the
inflexibility of S-expressions. I have been thinking about this issue for
quite a few years.
Fortunately the problem is easily solved in Julia, where the representation
of expressions is more flexible. The key observation is that there should
be two kinds of symbols in the expansion of a macro, which I will call
"external" and "internal." Internal symbols come from the macro
definition, external symbols come from the macro call. Informally,
external symbols are visible to the caller of the macro and internal
symbols are not, when used as variable names. Other uses for symbols, such
as terminals in the grammar ("else"), literals (":foo"), and field names
(".x") do not distinguish internal from external.
More formally, when a variable name is looked up in a scope, an internal
symbol only matches the same internal symbol. Thus binding an internal
symbol does not capture a reference to an external symbol, and vice versa.
When an internal symbol is not found in the current scope and its parents,
the external symbol with the same name is looked up in the scope where the
macro was defined. Thus free references in a macro expansion will have the
intended meaning, being looked up in the macro definition scope when the
reference came from the macro, but in the macro call scope when the
reference came from the caller.
How can this be implemented? External symbols are the plain old symbols
that already exist. Internal symbols are a new AST type with two fields,
name and context. The name is the corresponding external symbol. The
context remembers the scope where the macro was defined and is a unique
object freshly created for each macro call. The *quote* construct converts
literal external symbols to internal symbols. Interpolated data and
literal internal symbols are left alone. The unary colon short form of
*quote* is the same. Local variable names can be internal symbols and
variable binding lookup is adjusted as described above: two internal
symbols only match if both the names and the contexts are the same. Uses
of symbols other than as variable names are modified to treat internal
symbols the same as external.
The *esc* function is no longer needed and should be removed. Any
expression that originated in the macro call is automatically escaped. Now
the *assert* example in the manual actually works. If a macro like the
*zerox* example in the manual needs to put an external symbol into the
expansion, it just uses a plain old symbol:
*macro zerox() :($(symbol("x")) = 0) end*.
When a global variable is defined in a module, if the name is an internal
symbol it is converted to an external symbol so it is generally visible.
Where does the *quote* construct get the context when it makes an internal
symbol? There are several ways it could be done. I prefer for *quote* to
use the value of the variable *context*; if there is no variable with that
name in scope it is an error. The *macro* statement implicitly defines the
external symbol *context* in the expander function. Users who want to
break parts of the expander function into separate functions must pass the
context around explicitly. Users who want to build expressions outside of
a macro must define *context*. It may be useful to have a user-callable
*Context* constructor that takes a module as its argument.
Because internal symbols are not interned, there may be a speed decrease in
some cases. However this cost is only incurred at compile time.
Macro-defining macros work, provided that when *quote* sees a literal
internal symbol it copies it unchanged into the expression being
constructed. Thus the expansion of a macro defined by a macro-defining
macro may contain internal symbols whose context comes from either macro.
In the same way, recursive or nested macros work, with each internal
symbol remembering the context where it originated.
I have no strong opinion on whether macros are allowed to be defined in a
non-top-level context. If not, the macro definition scope remembered in an
internal symbol's context is just a module.
A literal symbol is no longer the same thing as construction of an
expression consisting of only a literal variable name. The former produces
an external symbol, the latter produces an internal symbol. One approach
would be to disallow *:x* for a literal symbol and require *symbol("x")* to
be used, but the verbosity might be unpopular. Another approach would be
to treat *:x* as a special case; if you want to produce an internal symbol
x you must use *quote x end* or *:(x)*.
Incompatible changes here:
- Remove *esc* (or make it a no-op).
- *quote* no longer works if *context* is not defined.
- same for unary colon, unless the argument is just a symbol.
Maybe the existing macros are good enough for your purposes, but I think
the hygiene could work better. What do you think?