Thanks, adding state to lexer seems to be the way to go. 2011/2/16 Mihai Maruseac <mihai.marus...@gmail.com>
> On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky <romand...@gmail.com> > wrote: > > Hi, > > > > using alex+happy, how could I parse lines like these? > > > >> "mr <username> says <message>\n" > > > > where both <username> and <message> may contain arbitrary characters > (except > > eol)? > > > > If I make lexer tokens > > > >> "mr " { T_Mr } > >> " says " { T_Says } > >> \r?\n { T_Eol } > >> . { T_Char $$ } > > > > and parser > > > >> 'mr ' { T_Mr } > >> ' says ' { T_Says } > >> eol { T_Eol } > >> char { T_Char } > > > > ... > > > >> line :: { (String, String) } > >> : 'mr ' string ' says ' string eol { ($2, $4) } > > > >> string :: { String } > >> : char { [ $1 ] } > >> | char string { $1 : $2 } > > > > then I get error when <username> or <message> contain "mr " > > substrings, because parser encounters T_Mr token. > > > > Workaround is mention all small tokens in my <string> definition: > > > >> string :: { String } > >> : { [] } > >> | 'mr ' string { "mr " ++ $2 } > >> | ' says ' string { " says " ++ $2 } > >> | char string { $1 : $2 } > > > > but that is weird and I'm sure there is a better way. > > > > I don't have an implementation right now but you could try having some > states or user data in which to record whether you have already parsed > the 'mr ' part (etc..) Guess you could use monadUserData parser (just > like I've found after a night without sleep [1] - solved now). > > -- > Mihai > > [1]: > http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe