On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky <romand...@gmail.com> wrote:
> Hi,
> using alex+happy, how could I parse lines like these?
>> "mr <username> says <message>\n"
> where both <username> and <message> may contain arbitrary characters (except
> eol)?
> If I make lexer tokens
>> "mr "    { T_Mr }
>> " says " { T_Says }
>> \r?\n    { T_Eol }
>> .        { T_Char $$ }
> and parser
>> 'mr '    { T_Mr }
>> ' says ' { T_Says }
>> eol      { T_Eol }
>> char     { T_Char }
> ...
>> line :: { (String, String) }
>>      : 'mr ' string ' says ' string eol { ($2, $4) }
>> string :: { String }
>>        : char        { [ $1 ] }
>>        | char string { $1 : $2 }
> then I get error when <username> or <message> contain "mr "
> substrings, because parser encounters T_Mr token.
> Workaround is mention all small tokens in my <string> definition:
>> string :: { String }
>>        :                 { [] }
>>        | 'mr ' string    { "mr "    ++ $2 }
>>        | ' says ' string { " says " ++ $2 }
>>        | char string     { $1 : $2 }
> but that is weird and I'm sure there is a better way.

I don't have an implementation right now but you could try having some
states or user data in which to record whether you have already parsed
the 'mr ' part (etc..) Guess you could use monadUserData parser (just
like I've found after a night without sleep [1] - solved now).


[1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html

Haskell-Cafe mailing list

Reply via email to