Hi again Per, I see my problem - I had written NUMBER+ instead of NUMBER thinking that I could test multiple numbers from the same file. In an actual program, there would be some token between the numbers, so what you presented to me was correct.
Craig On 2/18/07, Craig Ugoretz <[EMAIL PROTECTED]> wrote:
Hi Per, Thank you for your quick and relevent response. However, in all truthfulness, I tried a scheme about the same as you proposed, but ran into trouble. I apologize for not being more explicit about the nature of my trouble. For example, when I input 0b001101567, I get the following parse: Parse tree from AbstractMachine3.txt: Program(2001) Int(2002) NumberToken(2003) BIN_NUMBER(1005): "0b001101", line: 1, col: 1 Int(2002) NumberToken(2003) DEC_NUMBER(1007): "567", line: 1, col: 9 The problem is that the parse is breaking the inputted number into two parts, instead of leaving it as one and reporting an error, i.e. a binary number should not contain digits other than 0 or 1. I speculate this problem is due to the way the regular expressions for the tokens are constructed, hence my question about disambiguating the grammar. Again, I was not explicit in my orginal query. Additionally, in OCT_NUMBER, the first "0" (zero) should be an "O" (oh). Thanks, Craig On 2/18/07, Per Cederberg <[EMAIL PROTECTED]> wrote: > > Hi Craig, > > Most of the grammar below lends itself well to > tokenization with regular expressions. Consider > the following tokens: > > BIN_NUMBER = <<0(B|b)[0-1]+>> > OCT_NUMBER = <<0[0-7]*>> > DEC_NUMBER = <<[1-9][0-9]*>> > HEX_NUMBER = <<0(x|X)[0-9A-Fa-f]+>> > MINUS = "~" > DOT = "." > E = <<(e|E)>> > > With the help of these you can rewrite the rest of > the grammar: > > Int = ["~"] NumberToken ; > NumberToken = BIN_NUMBER > | OCT_NUMBER > | DEC_NUMBER > | HEX_NUMBER ; > Float = ["~"] DEC_NUMBER "." [DEC_NUMBER] [Exponent] ; > Exponent = E ["~"] DEC_NUMBER > > If you are working to expand this into a full > programming language grammar, you'll run into > issues with the E token. As the tokenizer is > not context sensitive, it will always return > the longest matching token. > > Also, I many grammars the definition of float > and integer decimal number are both built into > the same DEC_NUMBER token. For a full language > that is probably the better solution, leaving > some validation controls to the analyzer stage. > Here I opted for something more similar to your > original grammar. > > Cheers, > > /Per > > Craig Ugoretz wrote: > > Hello, > > > > I am new to grammatica (and parsers in general) and I have a > > grammar that I am trying to disambiguate. Can anyone lend any advice? > > Hopefully, this should get me on the right track with the rest of my > > work... I apologize for the notation - it is EBNF, but nonstandard > (and > > non-grammatica). > > > > <int> ::= ['~'] <nzdigit> { <digit> } > > | ['~'] O { <octdigit> }+ > > | ['~'] ('0x' | '0X') { <hexdigit> }+ > > | ['~'] ('0b' | '0B') { <bindigit> }+ > > <float> ::= ['~'] { <digit> }+ '.' { <digit> } { ('e' | 'E') ['~'] { > > <digit> }+ > > <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 > > <nzdigit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 > > <octdigit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 > > <hexdigit> ::= <digit> | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' > | > > 'C' | 'D' | 'E' | 'F' > > <bindigit> ::= 0 | 1 > > > > Can proper tokenization alone with regular expressions lend itself to > > disambiguating the grammar? This was a tactic that I tried, but was > not > > familar enough with regular expressions to make progress. > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Grammatica-users mailing list > > Grammatica-users@nongnu.org > > http://lists.nongnu.org/mailman/listinfo/grammatica-users > > > _______________________________________________ > Grammatica-users mailing list > Grammatica-users@nongnu.org > http://lists.nongnu.org/mailman/listinfo/grammatica-users >
_______________________________________________ Grammatica-users mailing list Grammatica-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/grammatica-users