[julia-users] Hygienic macros could be both better and simpler

David Moon Sun, 09 Feb 2014 12:18:05 -0800

Hygienic macros could be both better and simpler. Julia's hygiene only 
works in simple cases, I think, and requires too much manual intervention.  
This is because it is done in the wrong place, in the output of macro 
expansion, so the macro expander has to guess the context of each symbol.   
Hygiene ought to be done in the input to macro expansion, where the 
originating context of every symbol is known.  The *esc* function is a dead 
give-away that something is wrong.  It could be eliminated, which would 
make macros simpler to define.  Errors like the one in the manual's sample 
definition of *assert* would no longer occur.


I believe Julia's macros are taken directly from Scheme.  Scheme never 
found a fully satisfactory solution to the hygiene problem, because of the 
inflexibility of S-expressions.  I have been thinking about this issue for 
quite a few years.

Fortunately the problem is easily solved in Julia, where the representation 
of expressions is more flexible.  The key observation is that there should 
be two kinds of symbols in the expansion of a macro, which I will call 
"external" and "internal."  Internal symbols come from the macro 
definition, external symbols come from the macro call.  Informally, 
external symbols are visible to the caller of the macro and internal 
symbols are not, when used as variable names.  Other uses for symbols, such 
as terminals in the grammar ("else"), literals (":foo"), and field names 
(".x") do not distinguish internal from external.

More formally, when a variable name is looked up in a scope, an internal 
symbol only matches the same internal symbol.  Thus binding an internal 
symbol does not capture a reference to an external symbol, and vice versa. 
 When an internal symbol is not found in the current scope and its parents, 
the external symbol with the same name is looked up in the scope where the 
macro was defined.  Thus free references in a macro expansion will have the 
intended meaning, being looked up in the macro definition scope when the 
reference came from the macro, but in the macro call scope when the 
reference came from the caller.

How can this be implemented?  External symbols are the plain old symbols 
that already exist.  Internal symbols are a new AST type with two fields, 
name and context.  The name is the corresponding external symbol.  The 
context remembers the scope where the macro was defined and is a unique 
object freshly created for each macro call.  The *quote* construct converts 
literal external symbols to internal symbols.  Interpolated data and 
literal internal symbols are left alone.  The unary colon short form of 
*quote* is the same.  Local variable names can be internal symbols and 
variable binding lookup is adjusted as described above: two internal 
symbols only match if both the names and the contexts are the same.  Uses 
of symbols other than as variable names are modified to treat internal 
symbols the same as external.

The *esc* function is no longer needed and should be removed.  Any 
expression that originated in the macro call is automatically escaped.  Now 
the *assert* example in the manual actually works.  If a macro like the 
*zerox* example in the manual needs to put an external symbol into the 
expansion, it just uses a plain old symbol: 
*macro zerox() :($(symbol("x")) = 0) end*.

When a global variable is defined in a module, if the name is an internal 
symbol it is converted to an external symbol so it is generally visible.

Where does the *quote* construct get the context when it makes an internal 
symbol?  There are several ways it could be done.  I prefer for *quote* to 
use the value of the variable *context*; if there is no variable with that 
name in scope it is an error.  The *macro* statement implicitly defines the 
external symbol *context* in the expander function.  Users who want to 
break parts of the expander function into separate functions must pass the 
context around explicitly.  Users who want to build expressions outside of 
a macro must define *context*.  It may be useful to have a user-callable 
*Context* constructor that takes a module as its argument.

Because internal symbols are not interned, there may be a speed decrease in 
some cases.  However this cost is only incurred at compile time.

Macro-defining macros work, provided that when *quote* sees a literal 
internal symbol it copies it unchanged into the expression being 
constructed.  Thus the expansion of a macro defined by a macro-defining 
macro may contain internal symbols whose context comes from either macro. 
 In the same way, recursive or nested macros work, with each internal 
symbol remembering the context where it originated.

I have no strong opinion on whether macros are allowed to be defined in a 
non-top-level context.  If not, the macro definition scope remembered in an 
internal symbol's context is just a module.

A literal symbol is no longer the same thing as construction of an 
expression consisting of only a literal variable name.  The former produces 
an external symbol, the latter produces an internal symbol.  One approach 
would be to disallow *:x* for a literal symbol and require *symbol("x")* to 
be used, but the verbosity might be unpopular.  Another approach would be 
to treat *:x* as a special case; if you want to produce an internal symbol 
x you must use *quote x end* or *:(x)*.

Incompatible changes here:
- Remove *esc* (or make it a no-op).
- *quote* no longer works if *context* is not defined.
- same for unary colon, unless the argument is just a symbol.

Maybe the existing macros are good enough for your purposes, but I think 
the hygiene could work better.  What do you think?

[julia-users] Hygienic macros could be both better and simpler

Reply via email to