> 3. Make semicolons mandatory. +1 Ariel Jakobovits Email: arielj...@yahoo.com Phone: 650-690-2213 Fax: 650-641-0031 Cell: 650-823-8699
________________________________ From: David Arno <da...@davidarno.org> To: flex-dev@incubator.apache.org Sent: Tuesday, February 21, 2012 9:27 AM Subject: [gosh] On the sad tale of BNF and optional semicolons I have spent a bit of time over the last few days trying to define a BNF grammar for AS3. As "Left Right" predicted, it's looking like this just isn't possible. The problem is such a trivial little thing too: optional semicolons. To illustrate the problem, let me give an example piece of AS3 BNF: imports = import | imports import import = 'import' type_reference ';' As I'm sure there are many here who are unfamiliar with BNF, I'll explain what the above means. The first line defines a symbol: imports, which is defined as either being import, or a recursive reference to itself, followed by import. In other words, a language consisting of one or more import symbols is a language that matches the imports symbol. Next, I define what the import symbol is, it being an "import" keyword token, followed by yet another symbol, type_reference (which I haven't included here) and finally a semicolon. This snippet then covers import collections, such as: import flash.display.DisplayObject; import flash.events.Event; import flash.events.MouseEvent; However, there is a problem. AS3 doesn't mandate that I specify those semicolons at the end of the lines. AS3 supports implied semicolons: in certain circumstances, the newline character is good enough to tell the compiler that the end of a statement has been reached, so a semicolon isn't required. Unfortunately, such a concept cannot be handled by BNF. If end of line characters are significant within the grammar, then within BNF, they must be explicitly referenced. As a result, the BNF becomes really complex: imports = import | imports import import = 'import' type_reference import_terminator | 'import' unimportant_newlines type_reference statement_terminator unimportant_newlines = '\n' | unimportant_newlines '\n' statement_terminator = ';' | ';' unimportant_newlines | '\n' | '\n' unimportant_newlines | unimportant_newlines ';' | unimportant_newlines ';' unimportant_newlines Not only does this become unreadable (remember, unimportant_newlines will appear absolutely everywhere in the BNF where whitespace is allowed in the code, but any tool that generates a parser from BNF-like definitions will complain of conflicts as newline characters can apply to multiple rules at any one time and so the parser has to guess which one to use. As far as I can see, we have three choices: 1. Hand craft a parser that can handle optional semicolons, rather than using a BNF-based one. I really don't want to do this as it requires us to pick one language for the compiler , it's harder to maintain and takes longer to write. 2. Hand craft a lexical analyser that knows about optional semicolons and inserts missing ones into the token stream passed to the parser. I have to confess I've no idea at this stage how feasible this would be, and it has the same issues of language-specificity and complexity as the previous option. 3. Make semicolons mandatory. The purpose of this email is to gauge people's reactions to option 3. If we created a compiler that mandated semicolons, would this cause problems for anyone? Is it an idea we can consider, or is it a complete no-no? Thoughts and opinions please people. David.