On 20/06/2010 22:46, Alix Pexton wrote:
On 20/06/2010 21:37, Ellery Newcomer wrote:
On 06/20/2010 03:01 PM, Alix Pexton wrote:
On 19/06/2010 21:12, Alix Pexton wrote:
I've been sketching some grammar diagrams for D2.0, a little like those
on JSON.org, and of course I didn't get far before I ran into something
odd.


I think I will take the plunge and base my diagrams on the source of
DMD. After looking at the code in lexer.c, it does not seem as far
beyond my rusty old c++ parsing skills as I had expected! Massive credit
to Walter for having a codebase that is as mature as DMD without it
turning into a labyrinth of preprocessor macros and cryptic "comefrom"s.

This will mean however that my little project may take a little longer,
sigh...

A...

Do share. I've always been too lazy to read lexer.c, and from this
discussion, it sounds like there are a few spots where my own lexer
grammar is incorrect (or at least differs from dmd).


of course ^^

A...

Well, I think I have got my head around lexer.c now, and its various peculiarities, like "000377." being a valid float (although not according to my shiny new, limited edition copy of tDPL (fig2.2 p35)^^).

The weirdness occurs because some of some corner cases are handled not by the neat little state state machine that validates reals, but in the scanner at the point where it recognises a number beginning with a zero. The productions in lex.html represent the range of inputs that are accepted by the state machine without taking into account that the scanner rejects the sequence "._" (which makes sense as that is the identifier "_" in the outer scope).

Andrei's analysis in tDPL also points out that 0xp0 is a valid hexfloat, but a strict reading of lex.html would not allow it.

Overall the diagram for hexfloat is much simpler than the one for decimalfloat, which I think will have to be split into 3 ><

A...

PS, octal must die!

Reply via email to