Hi again Per,

    I see my problem - I had written NUMBER+ instead of NUMBER thinking
that I could test multiple numbers from the same file.  In an actual
program, there would be some token between the numbers, so what you
presented to me was correct.

                                                                    Craig


On 2/18/07, Craig Ugoretz <[EMAIL PROTECTED]> wrote:

Hi Per,

     Thank you for your quick and relevent response.  However, in all
truthfulness, I tried a scheme about the same as you proposed, but ran into
trouble.  I apologize for not being more explicit about the nature of my
trouble.  For example, when I input 0b001101567, I get the following
parse:

Parse tree from AbstractMachine3.txt:

Program(2001)

Int(2002)

NumberToken(2003)

BIN_NUMBER(1005): "0b001101", line: 1, col: 1

Int(2002)

NumberToken(2003)

DEC_NUMBER(1007): "567", line: 1, col: 9
     The problem is that the parse is breaking the inputted number into
two parts, instead of leaving it as one and reporting an error, i.e. a
binary number should not contain digits other than 0 or 1.  I speculate this
problem is due to the way the regular expressions for the tokens are
constructed, hence my question about disambiguating the grammar.  Again, I
was not explicit in my orginal query.  Additionally, in OCT_NUMBER, the
first "0" (zero) should be an "O" (oh).

                                                               Thanks,
                                                               Craig

 On 2/18/07, Per Cederberg <[EMAIL PROTECTED]> wrote:
>
> Hi Craig,
>
> Most of the grammar below lends itself well to
> tokenization with regular expressions. Consider
> the following tokens:
>
> BIN_NUMBER           = <<0(B|b)[0-1]+>>
> OCT_NUMBER           = <<0[0-7]*>>
> DEC_NUMBER           = <<[1-9][0-9]*>>
> HEX_NUMBER           = <<0(x|X)[0-9A-Fa-f]+>>
> MINUS                = "~"
> DOT                  = "."
> E                    = <<(e|E)>>
>
> With the help of these you can rewrite the rest of
> the grammar:
>
> Int         = ["~"] NumberToken ;
> NumberToken = BIN_NUMBER
>             | OCT_NUMBER
>             | DEC_NUMBER
>             | HEX_NUMBER ;
> Float       = ["~"] DEC_NUMBER "." [DEC_NUMBER] [Exponent] ;
> Exponent    = E ["~"] DEC_NUMBER
>
> If you are working to expand this into a full
> programming language grammar, you'll run into
> issues with the E token. As the tokenizer is
> not context sensitive, it will always return
> the longest matching token.
>
> Also, I many grammars the definition of float
> and integer decimal number are both built into
> the same DEC_NUMBER token. For a full language
> that is probably the better solution, leaving
> some validation controls to the analyzer stage.
> Here I opted for something more similar to your
> original grammar.
>
> Cheers,
>
> /Per
>
> Craig Ugoretz wrote:
> > Hello,
> >
> >      I am new to grammatica (and parsers in general) and I have a
> > grammar that I am trying to disambiguate.  Can anyone lend any advice?
> > Hopefully, this should get me on the right track with the rest of my
> > work...  I apologize for the notation - it is EBNF, but nonstandard
> (and
> > non-grammatica).
> >
> > <int> ::= ['~'] <nzdigit> { <digit> }
> >          |  ['~'] O { <octdigit> }+
> >          |  ['~'] ('0x' | '0X') { <hexdigit> }+
> >          |  ['~'] ('0b' | '0B') { <bindigit> }+
> > <float> ::= ['~'] { <digit> }+ '.' { <digit> } { ('e' | 'E') ['~'] {
> > <digit> }+
> > <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
> > <nzdigit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
> > <octdigit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
> > <hexdigit> ::= <digit> | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B'
> |
> > 'C' | 'D' | 'E' | 'F'
> > <bindigit> ::= 0 | 1
> >
> > Can proper tokenization alone with regular expressions lend itself to
> > disambiguating the grammar?  This was a tactic that I tried, but was
> not
> > familar enough with regular expressions to make progress.
> >
> >
> >
> ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Grammatica-users mailing list
> > Grammatica-users@nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
> _______________________________________________
> Grammatica-users mailing list
> Grammatica-users@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>


_______________________________________________
Grammatica-users mailing list
Grammatica-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/grammatica-users

Reply via email to