Hi! > Le 1 mars 2019 à 03:10, H. S. Teoh <[email protected]> a écrit : > > On Thu, Feb 28, 2019 at 10:19:02PM +0100, Akim Demaille wrote: >> Hi HS! >> >> This is already quite advanced compared to the current state of the >> D skeleton. What I'm looking for is a simple scanner that works with >> what we have. For instance below I tried to import your suggestion >> into examples/d/calc.y, but it does not work. Could you help me >> make it work? Just put it in examples/d/ and run "make check". > [...] > > Try the attached version of calc.y instead.
Ah, I had realized I could you directly plug what you sent into the existing framework. I expected to have to change the skeleton. > Eventually I think we should move away from the Lexer class, and have > the generated parser accept any type that has the required interface > (yylex(), semanticVal(), etc.). I agree. We both agree that the current skeleton has a Java flavor. > In the ideal case, the lexer would > simply be a struct that wraps around an arbitrary input range of > characters. In order to make things work with the current code, though, > I conceded to make CalcLexer a class that implements Lexer. We can > change that when we get around to moving away from lexer classes. Exactly. I want to have the test suite cover more and more cases (debugging traces and locations should have priority imho), but that should not stop you from improving the skeleton! The version you sent works, thanks! Is it ok if I install it under your name? The work is yours, not mine: (add there's a question for you afterwards). (For the records, the GNU Coding Standards require a space before the parens in function calls. But the GCS were written because there are too many styles in C/C++, and you have to pick one. If the same dissonance hits D, then let's stick to the GCS. However, if there is one "true" style in D, tell me and let's stick to it.) commit 6a56d941eccbeed5b48f432fa2cbf1193fe8009d Author: H. S. Teoh <[email protected]> Date: Fri Mar 1 06:16:54 2019 +0100 d: modernize the scanner of the example https://lists.gnu.org/archive/html/bison-patches/2019-02/msg00121.html * examples/d/calc.y (CalcLexer): Stop shoehorning C's API into D: use a range based approach in the scanner, rather than some imitation of getc/ungetc. (main): Adjust. diff --git a/examples/d/calc.y b/examples/d/calc.y index 5c7975a3..c97e4f3a 100644 --- a/examples/d/calc.y +++ b/examples/d/calc.y @@ -53,24 +53,23 @@ exp: ; %% -class CalcLexer : Lexer { +import std.range.primitives; - // Should be a local in main, shared with %parse-param. - int exit_status = 0; +auto calcLexer(R)(R range) + if (isInputRange!R && is (ElementType!R : dchar)) +{ + return new CalcLexer!R(range); +} - int - get_char () - { - import stdc = core.stdc.stdio; - return stdc.getc (stdc.stdin); - } +class CalcLexer(R) : Lexer + if (isInputRange!R && is (ElementType!R : dchar)) +{ + R input; - void - unget_char (int c) - { - import stdc = core.stdc.stdio; - stdc.ungetc (c, stdc.stdin); - } + this(R r) { input = r; } + + // Should be a local in main, shared with %parse-param. + int exit_status = 0; public void yyerror (string s) { @@ -78,53 +77,39 @@ class CalcLexer : Lexer { stderr.writeln (s); } - int - read_signed_integer () - { - int c = get_char (); - int sign = 1; - int n = 0; - - if (c == '-') - { - c = get_char (); - sign = -1; - } - - while (isDigit (c)) - { - n = 10 * n + (c - '0'); - c = get_char (); - } - - unget_char (c); - return sign * n; - } - YYSemanticType semanticVal_; - public final @property YYSemanticType semanticVal() + public final @property YYSemanticType semanticVal () { return semanticVal_; } int yylex () { - int c; - /* Skip white spaces. */ - do - {} - while ((c = get_char ()) == ' ' || c == '\t'); - - /* process numbers */ - if (c == '.' || isDigit (c)) + import std.uni : isWhite, isNumber; + + // Skip initial spaces + while (!input.empty && input.front != '\n' && isWhite (input.front)) + { + input.popFront; + } + + // Handle EOF. + if (input.empty) + return YYTokenType.EOF; + + // Numbers. + if (input.front == '.' || input.front.isNumber) { - unget_char (c); - semanticVal_.ival = read_signed_integer (); + import std.conv : parse; + semanticVal_.ival = input.parse!int; return YYTokenType.NUM; } - switch (c) + // Individual characters + auto ch = input.front; + input.popFront; + switch (ch) { case EOF: return YYTokenType.EOF; case '=': return YYTokenType.EQ; @@ -142,7 +127,16 @@ class CalcLexer : Lexer { int main () { - CalcLexer l = new CalcLexer (); + import std.algorithm : map, joiner; + import std.stdio; + import std.utf : byDchar; + + auto l = stdin + .byChunk(1024) // avoid making a syscall roundtrip per char + .map!(chunk => cast(char[]) chunk) // because byChunk returns ubyte[] + .joiner // combine chunks into a single virtual range of char + .calcLexer; + Calc p = new Calc (l); p.parse (); return l.exit_status; I think we should also expose a simpler API to build the scanner from a File, *in addition* to the range based one? Just as a convenience wrapper. WDYT? commit bfb6abf6aa1e4bae7ffaec76c4dea5482b242331 Author: Akim Demaille <[email protected]> Date: Fri Mar 1 06:37:58 2019 +0100 d: simplify the API to build the scanner of the example * examples/d/calc.y (calcLexer): Add an overload for File. Use it. diff --git a/examples/d/calc.y b/examples/d/calc.y index c97e4f3a..99d343bc 100644 --- a/examples/d/calc.y +++ b/examples/d/calc.y @@ -61,6 +61,16 @@ auto calcLexer(R)(R range) return new CalcLexer!R(range); } +auto calcLexer(File input) +{ + import std.algorithm : map, joiner; + auto l = input + .byChunk(1024) // avoid making a syscall roundtrip per char + .map!(chunk => cast(char[]) chunk) // because byChunk returns ubyte[] + .joiner; // combine chunks into a single virtual range of char + return calcLexer(l); +} + class CalcLexer(R) : Lexer if (isInputRange!R && is (ElementType!R : dchar)) { @@ -127,17 +137,8 @@ class CalcLexer(R) : Lexer int main () { - import std.algorithm : map, joiner; - import std.stdio; - import std.utf : byDchar; - - auto l = stdin - .byChunk(1024) // avoid making a syscall roundtrip per char - .map!(chunk => cast(char[]) chunk) // because byChunk returns ubyte[] - .joiner // combine chunks into a single virtual range of char - .calcLexer; - - Calc p = new Calc (l); + auto l = stdin.calcLexer (); + auto p = new Calc (l); p.parse (); return l.exit_status; }
