On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky <romand...@gmail.com> wrote: > Hi, > > using alex+happy, how could I parse lines like these? > >> "mr <username> says <message>\n" > > where both <username> and <message> may contain arbitrary characters (except > eol)? > > If I make lexer tokens > >> "mr " { T_Mr } >> " says " { T_Says } >> \r?\n { T_Eol } >> . { T_Char $$ } > > and parser > >> 'mr ' { T_Mr } >> ' says ' { T_Says } >> eol { T_Eol } >> char { T_Char } > > ... > >> line :: { (String, String) } >> : 'mr ' string ' says ' string eol { ($2, $4) } > >> string :: { String } >> : char { [ $1 ] } >> | char string { $1 : $2 } > > then I get error when <username> or <message> contain "mr " > substrings, because parser encounters T_Mr token. > > Workaround is mention all small tokens in my <string> definition: > >> string :: { String } >> : { [] } >> | 'mr ' string { "mr " ++ $2 } >> | ' says ' string { " says " ++ $2 } >> | char string { $1 : $2 } > > but that is weird and I'm sure there is a better way. >
I don't have an implementation right now but you could try having some states or user data in which to record whether you have already parsed the 'mr ' part (etc..) Guess you could use monadUserData parser (just like I've found after a night without sleep [1] - solved now). -- Mihai [1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe