Greetings! Have you looked at the Java grammar in the v3 example suite? also....
On Thu, 2011-09-08 at 18:08 +0000, [email protected] wrote: > Hello- I'm working on a grammar that needs to support embedded blanks in > strings: "identifier=two words" > The interpreter keeps breaking at 'two' and doesn't know what to do with > 'words'. don't use the interpreter. it has some quirks. > I was initially ignoring white space (because 'id1 = oneword, id2 =" two > words"' must also be supported with spaces around the = and ,), but > obviously, can't do that. > I have tried what was suggested in an archived post: > > STRING_LITERAL : (STRCHAR)+ ( ((' ')+ STRCHAR)=> (' ')+ (STRCHAR)+ )* are you lexing the leading/trailing quote marks separately from the characters comprising the string literal? if so don't do that. > But that didn't work either! (no viable alternative at input 'words'). It's > not including 'words' as part of the string. > > In my grammar: > fragment LETTER :('a'..'z' | 'A'..'Z'); > fragment DIGIT : '0'..'9'; > fragment OTHERCHARS : ('.' | '/' | '-' | '&'); > STRCHAR : (LETTER | DIGIT | OTHERCHARS)+; > > I have tried various combinations of handling the blank in the lexing v. the > parsing, including trying to create a quoted-string rule. > I will have to support the following: you want the string literal to be processed completely by the lexer, from the opening quote up to and including the closing quote. that way no other tokens will interfere with handling the characters between the quote marks. > > "identifier =two words" > identifier ="two words" > > The identifier=value pairs appear in a comma-separated line. There are > various nested structures of identifier=value pairs involved, which is why > both of the above formats are supported. > > *** Bottom line*** I just want to indicate: If a space appears between > quotation marks, include it as part of the current token; if not, throw it > away. > > I have everything working in a complex structure and tree walker except for > the embedded blanks allowed in strings! Any suggestions are appreciated. these lexer rules work for me: STRING : '"' (options{greedy=false;}:( ~('\\'|'"') | ('\\' '"')))* '"'; WS : ( ' ' | '\t' | '\f' | '\r' | '\n' )+ { $channel=HIDDEN; } ; Hope this helps... -jbb List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
