JOHN! THANK YOU! You don't know how long I've been struggling with this - and now that you explain it, it makes perfect sense!
I will heed your warning about * and ? - I see how they match empty strings now. Thanks, Nik On Tue, Jan 12, 2010 at 9:21 PM, John B. Brodie <[email protected]> wrote: > Greetings! > > Your WS lexer rule can recognize the empty string, this is VERY bad. > > Because WS can recognize the empty string your lexer will enter an > infinite loop when encountering a character it can not deal with - like > the '_' in your example - you have no lexer rule that can handle a '_'. > > More below... > > On Tue, 2010-01-12 at 20:52 -0500, Nik Molnar wrote: > > Hello all, > > > > I am rather new to ANTLR and seem to be running into a small issue I > can't > > figure out. > > > > I'm writing a very simple grammar based on many tutorials online, the > > calculator. > > > > This grammar generates C# code that compiles perfectly, and works for the > > most part in ANTLRWorks Interpreter, Debugger and in a sample app I made > in > > .NET to call the generated Parser/Lexer. > > > > The problem I run into is what I put in invalid syntax, expecting an > error. > > Output like so: > > > > Valid Syntax: "3+3" => Works in interpreter, debugger and compiled .net > > code. > > Invalid Syntax: "3+/3" => Gives error in interpreter, debugger and > compiled > > .net code, as expected. > > Invalid Syntax: "3_3" => The interpreter shows nothing, the debugger > cannot > > connect and the .net code hangs for a while then throws an out of memory > > exception. > > Your lexer will correctly identify the first '3' as an INT. Next your > lexer will see the '_' which it is unable to deal with. BUT since your > WS rule says that the empty string - the non-stuff between the first '3' > and the '_' - is legal, your lexer accepts that empty string as a WS > token and deposits it into the HIDDEN channel. Now the lexer is still > looking at the '_' which it is unable to deal with. BUT since your WS > rule says that the empty string - the non-stuff between the first '3' > and the '_' - is legal, your lexer accepts that empty string as a WS > token and deposits it into the HIDDEN channel. Now the lexer is still > looking at the '_' which it is unable to deal with. BUT since your WS > rule says that the empty string - the non-stuff between the first '3' > and the '_' - is legal, your lexer accepts that empty string as a WS > token and deposits it into the HIDDEN channel. Now the lexer is still > looking at the '_' .... and so nothing good results. > > Your .NET app runs out of memory because the infinite sequence of empty > WS tokens appended onto the HIDDEN channel just gobbles up all memory. > > The debugger can not connect because the connections happens after the > lexer has finished tokenizing the input text. Your lexer never finishes > so the debugger won't connect. I bet if you waited long enuf you would > eventually run out of memory in this case too. > > Same drill for the interpreter.... > > > > > I'm sure I'm doing something wrong in my grammar but don't know what. > > > > I've included it below. Please help me! > > > > Thanks, > > > > grammar Test; > > > > /*options > > { > > language = 'CSharp2'; > > }*/ > > > > expression > > : amExpression; > > > > amExpression > > :mdExpression ((PLUS|DASH) mdExpression)* > > ; > > > > mdExpression > > :INT ((STAR|SLASH) INT)* > > ; > > > > DASH > > :'-' > > ; > > > > SLASH > > :'/' > > ; > > > > WS > > : (' ' > > | '\t' > > | '\n' > > | '\r')* > > { $channel = HIDDEN; } > > ; > > the * above should really be a + > > be VERY careful with rules that can recognize the empty string, e.g. > have just a * or ? operator. > > I have NEVER found an instance where a lexer rule that accepts nothing > (the empty string) does anything that helps. > > On RARE occasions, a parser rule that accepts the empty string can be > appropriate, but needs to be examined VERY closely. > > > > > STAR > > : '*' > > ; > > > > PLUS > > : '+' > > ; > > > > fragment DIGIT > > : '0'..'9' > > ; > > > > INT > > : (DIGIT)+ > > ; > > Hope this helps... > -jbb > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
