On 2/7/2018 1:07 PM, Nathan S. wrote:
On Tuesday, 6 February 2018 at 22:29:07 UTC, Walter Bright wrote:
nobody uses regex for lexer in a compiler.

Some years ago I was surprised when I saw this in Clojure's source code. It appears to still be there today:

https://github.com/clojure/clojure/blob/1215ba346ffea3fe48def6ec70542e3300b6f9ed/src/jvm/clojure/lang/LispReader.java#L66-L73

---
static Pattern symbolPat = Pattern.compile("[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"); //static Pattern varPat = Pattern.compile("([\\D&&[^:\\.]][^:\\.]*):([\\D&&[^:\\.]][^:\\.]*)");
//static Pattern intPat = Pattern.compile("[-+]?[0-9]+\\.?");
static Pattern intPat =
         Pattern.compile(
"([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?");
static Pattern ratioPat = Pattern.compile("([-+]?[0-9]+)/([0-9]+)");
static Pattern floatPat = Pattern.compile("([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?");
---


Yes, I'm sure somebody does it. And now that regex has produced a match, you have to scan it again to turn it into a number, making for slow lexing. And if regex doesn't produce a match, you get a generic error message rather than something specific like "character 'A' is not allowed in a numeric literal".

(Generic error messages are one of the downsides of using tools like lex and 
yacc.)

Reply via email to