On 2/7/2018 1:07 PM, Nathan S. wrote:
On Tuesday, 6 February 2018 at 22:29:07 UTC, Walter Bright wrote:
nobody uses regex for lexer in a compiler.
Some years ago I was surprised when I saw this in Clojure's source code. It
appears to still be there today:
https://github.com/clojure/clojure/blob/1215ba346ffea3fe48def6ec70542e3300b6f9ed/src/jvm/clojure/lang/LispReader.java#L66-L73
---
static Pattern symbolPat =
Pattern.compile("[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)");
//static Pattern varPat =
Pattern.compile("([\\D&&[^:\\.]][^:\\.]*):([\\D&&[^:\\.]][^:\\.]*)");
//static Pattern intPat = Pattern.compile("[-+]?[0-9]+\\.?");
static Pattern intPat =
Pattern.compile(
"([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?");
static Pattern ratioPat = Pattern.compile("([-+]?[0-9]+)/([0-9]+)");
static Pattern floatPat =
Pattern.compile("([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?");
---
Yes, I'm sure somebody does it. And now that regex has produced a match, you
have to scan it again to turn it into a number, making for slow lexing. And if
regex doesn't produce a match, you get a generic error message rather than
something specific like "character 'A' is not allowed in a numeric literal".
(Generic error messages are one of the downsides of using tools like lex and
yacc.)