The semantics of Parsec's "optional" operation are what is causing the problem.

"optional foo" can have 3 results:
  1) foo can succeed, optional succeeds, proceed to next command
2) foo can fail without consuming any input, optional succeeds proceed to next command
  3) foo can fail after consuming some input, optional fails, do not proceed

> minilang = do
>        char 'a'
>        optional (do {comma ; char 'b'})

The comma in the above line consumes input even in the "a,c" case. When "c" is seen the "char 'b'" fails and then the optional fails, and you get the error message you posted.

>        optional (do {comma ; char 'c'})
>        eof
>        return "OK"

> Apparently, "try" was used (do note that the column number indicates
> that there was backtracking) but the parser still fails for
> "a,c". Why?

Your next attempt does not fix the problem, since the try is in the wrong place ( http://www.cs.uu.nl/~daan/download/parsec/parsec.html#try may help)

minilang = do
       char 'a'
       try (optional (do {comma ; char 'b'}))

In the above line, the ",c" causes (char 'b') to fail, which causes 'optional' to fail, and then "try" also fails. The "try" alters the stream so that the "comma" was not consumed, but the "try" still passes along the failure.

In neither the original or the modified minilang does the 'char "c"' line ever get reached in the "a,c" input case.

The working solution is a small tweak:

minilang = do
       char 'a'
       optional (try (do {comma ; char 'b'}))
       optional (do {comma ; char 'c'})
       eof
       return "OK"

Now the "a,c" case causes the (char 'b') to fail, and then the "try" also fails, but also acts as if the comma had not been consumed. Thus we are in case #2 of the semantics of "optional" and so "optional" succeeds instead of failing, allowing the next line to parse ",c" then eof then return "OK".

There is a very very important difference to Parsec between failing with and without having consumed input. It means Parsec can be more efficient, since any branch that consumes input cannot backtrack. The "try" command is a way to override this optimization and allow more backtracking.

The other solution presented on this list was:

minilang = do
       char 'a'
       try b <|> (return '-')
       optional c
       eof
       return "OK"
  where
  b = do { comma ; char 'b' }
  c = do { comma ; char 'c' }

In this case, the "optional" was replace by (<|> (return '-')). In fact you could define optional this way:

optional :: GenParser tok st a -> GenParser tok st ()
optional foo = (foo >> return ()) <|> (return ())

Thus "optional (try b)" is actually the same as "(b >> return ()) <|> (return ())". So you can see my suggestion is really identical the previous one.

I could not help generalizing your toy problem to an ordered list of comma separated Char. Note that "try" is not actually needed in listlang, but it would be if (char x) were replaced by something that can consume more than a single character:

listlang :: [Char] -> GenParser Char st [Char]
listlang [] = eof >> return []
listlang (x:xs) = useX <|> listlang xs
  where useX = do try (char x)
                  rest <- end <|> more
                  return (x:rest)
        end = (eof >> return [])
        more = comma >> listlang xs

Now minilang (the fixed version) is the same as (listlang ['a','b','c']) or (listlang "abc"). This is a good example:

*Main> run (listlang "abcd") "c,b"
parse error at (line 1, column 3):
unexpected "b"
expecting "d" or end of input

--
Chris

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to