Rainer Deyke wrote:
I'm not entirely happy with the way Scala handles the division between
statements - Scala's rules seem arbitrary and complex - but semicolons
*are* noise, no matter how habitually I use them and how much time I
waste removing them afterwards.

I don't know anything about scala, but I've been working on an Actionscript compiler recently (the language is based on ECMAScript, so it's very much like JavaScript in this respect) and the optional semicolon rules are completely maddening.

The ECMAScript spec basically says: virtual semicolons must be inserted at end-of-line whenever the non-insertion of semicolons would result in an erroneous parse.

So there are really only three ways to handle it, and all of them are insane:

1) Treat the newline character as a token (rather than as skippable whitespace) and include that token as an optional construct in every single production where it can legally occur. This results in hundreds of optional semicolons throughout the grammar, and makes the whole thing a nightmare to read, but at least it still uses a one-pass CFG.

    CLASS :=
      "class"
      NEWLINE?
      IDENTIFIER
      NEWLINE?
      "{"
      NEWLINE?
      (
        MEMBER
        NEWLINE?
      )*
      "}"

2) Use lexical lookahead, dispatched from the parser. The tokenizer determines whether to treat a newline as a statement terminator based on the current parse state (are we in the middle of a parenthetized expression?) and the upcoming tokens on the next line. This is nasty because the grammar becomes context-sensitive and conflates lexical analysis with parsing.

2) Whenever the parser encounters an error, have it back up to the beginning of the previous production and insert a virtual semicolon into the token stream. Then try reparsing. Since there might be multiple newlines contained in a single multiline expression, it might take arbitrarily many rewrite attempts before reaching a correct parse.

The thing about most compiler construction tools is that they don't allow interaction between the context-guided tokenization, and they're not designed for the creation of backup-and-retry processing, or the insertion of virtual tokens into the token stream.

Ugly stuff.

Anyhoo, I know this is waaaaaaay off topic. But I think any language designer including optional semicolons in their language desperately deserves a good swift punch in the teeth.

--benji

Reply via email to