Hello,

I have some input which is divided up into segments like this:


["foo", "hi", "there", "world"]

And I want to use parsec to parse the segments. I am looking for a way to be
able to use Char parsers on each segment, but also parse the list of
segments as a whole.


I have an implementation which works, but I am not feeling that satisfied
with it. An example parser looks like this:

testp :: GenParser String () (Char, String, String)
testp =
  do segment "foo"
     st <- p2u (char 'h' >> char 'o')
     sg <- anySegment
     sg' <- anySegment
     return (st,sg, sg')

If you use it to parse ["foo", "hi", "there", "world"] you get:

*Main> test
["foo","hi","there","world"] (segment 2 character 2):
unexpected "i"
expecting "o"

This is good -- the error tells you the segment and the offset. The
combinators, segment and anySegment match on String tokens. The p2u function
converts a Char parser into a String parser.

The primary issue is that I can not figure out how to implement p2u so that
it works under Parsec 2 and 3 with out changes.

The secondary issue is that I feel like the DSL is not that great. I would
rather write the above parser like this:

testp :: GenParser String () (Char, String, String)
testp =
  do string "foo"
       nextSegment
       char 'h' >> char 'o'
       nextSegment
       sg <- many anyChar
       nextSegment
       sg' <- many anyChar
       return (st,sg, sg')

so, the combinators like 'many' would only see to the end of the current
segment. nextSegment would bring the next segment into context (or possibly
fail if the current segment was not completely consumed). I could also,
perhaps, reimplement anySegment as:

anySegment =
   do s <- many anyChar
        nextSegment
        return s

and write:

testp :: GenParser String () (Char, String, String)
testp =
  do string "foo"
       nextSegment
       char 'h' >> char 'o'
       nextSegment
       sg <- anySegment
       sg' <- anySegment
       return (st,sg, sg')

I don't see a way to implement this on top of parsec though.

Am I missing some clever ideas here?

I have attached my code that implements the first method. Any ideas how to
make it parsec 2/3 agnostic ? The primary issue is that if I use runParser
on the inner Char parser, I need some way to stick a ParseError back into
the parent context (with out just converting it to a String and calling
fail). I can see how to get/set the inputState, etc, in a portable way, but
I don't see how to set the ParseError in a portable way.The attached code
requires 3.1.0 because it uses mkPT / runParsecT :(

thanks!
- jeremy
import Text.ParserCombinators.Parsec
import Text.Parsec.Prim
import Text.ParserCombinators.Parsec.Error

segment :: String -> GenParser String () String
segment x = pToken (const x) (\y -> if x == y then Just x else Nothing)


anySegment :: GenParser String () String
anySegment = pToken (const "any string") Just

pToken msg f = do pos <- getPosition
                  token id (const $ incSourceLine pos 1) f
                  
p2u :: GenParser Char () a -> GenParser String () a
p2u p = 
  mkPT $ \state@(State sInput sPos sUser) -> 
  case sInput of
    (s:ss) ->
       do r <- runParsecT p (State s sPos sUser)
          return (fmap (fmap (fixReply ss)) r)

    where
      fixReply :: [String] -> (Reply String u a) -> (Reply [String] u a)
      fixReply _ (Error err) = (Error err)
      fixReply ss (Ok a (State "" sPos sUser) e) = (Ok a (State ss sPos sUser) e) 
      fixReply ss (Ok a (State s sPos sUser) e) = (Ok a (State (s:ss) sPos sUser) e)  
     
test :: IO ()
test = 
  let segments = ["foo", "hi", "there", "world"] in
  case parse testp (show segments) segments of
    (Left e) -> putStrLn $ showParseError e
    (Right r) -> print r
  
testp :: GenParser String () (Char, String, String)
testp =  
  do segment "foo"
     st <- p2u (char 'h' >> char 'o')
     sg <- anySegment
     sg' <- anySegment
     return (st,sg, sg')


showParseError :: ParseError -> String
showParseError pErr =
  let pos    = errorPos pErr
      posMsg = sourceName pos ++ " (segment " ++ show (sourceLine pos) ++ " character " ++ show (sourceColumn pos) ++ "): "
      msgs   = errorMessages pErr
  in posMsg ++ showErrorMessages "or" "unknown parse error" "expecting" "unexpected" "end of input" msgs
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to