Hi Doaitse, that is very interesting, and I'll take a precise look at the uu-parsinglib. Regarding my original question there exist (I believe) one serious problem: existing code is written exclusively using Parsec and it's already quite complex. At first glimpse I don't see an obvious way to use both libraries in one parsing module simulatiously. However, these are a very good news indeed, thank you
2009/8/9 S. Doaitse Swierstra <[email protected]> > The uu-parsinglib: > > > http://hackage.haskell.org/packages/archive/uu-parsinglib/2.2.0/doc/html/Text-ParserCombinators-UU-Core.html > > contains a combinator to achieve just this: > > -- parsing two alternatives and returning both rsults > pAscii = pSym ('\000', '\254') > pIntList = pParens ((pSym ';') `pListSep` (read <$> pList (pSym > ('0', '9')))) > parseIntString = pList (pAscii) > > parseBoth = pPair pIntList parseIntString > > pPair p q = amb (Left <$> p <|> Right <$> q) > > > The amb combinator tells you that it's parser parameter is ambiguous, and > returns you all the possible results. Amazingly it still maintains its > online behaviour. The only problem is that if either one of the parsers > fails then you will get only a single result. > > I have added the code above to the Examples.hs contained in the > uu-parsinglib (so it will show up in due time when I release a new version) > which I am attaching. Just load this file, and call the function main to see > what are the results of the different parsers and correction strategies. The > only problem is that if either one of the parsers fails you will only get > one of the results. If both fail you will get the result which fails latest > and if both fail at the same place, the one which fails with the least > repair costs. > > If you really want both results, even if the input is erroneaous, things > become more complicated, especially if you want to embed this parser in a > larger one, since then we have to check whether both parse the same prefix. > If needed I could put some work into this, by making a slightly different > version of the amb combinator. > > Doaitse > > > > > On 6 aug 2009, at 21:03, Dan Weston wrote: > > Paul, >> >> Arrows (and category theory in general) are interesting, but you certainly >> don't need to understand them for this. >> The only arrow in this code is the lowly function arrow (->). (&&&) and >> (|||) are duals of each other and mean, respectively, "both" and "either" >> (though for some bizarre reason, "both" is usually called "fanout"!) >> >> This style of pointfree (or "pointless") code is clearer to me because I >> don't have a bunch of variable names to invent and have lying around. >> >> Anyway, if you prefer, don't import Control.Arrow at all, and just use: >> >> -- |Both: Apply two functions to same argument and tuple the results >> infixr 3 &&& >> (&&&) :: (a -> b) -> (a -> c) -> a -> (b,c) >> (f &&& g) x = (f x, g x) >> >> -- |Either: If argument is Left, apply Left function, else apply Right >> function >> infixr 2 ||| >> (|||) :: (a -> c) -> (b -> c) -> Either a b -> c >> (|||) = either >> >> either is implicitly imported from the Prelude and is defined as: >> >> -- | Case analysis for the 'Either' type. >> -- If the value is @'Left' a@, apply the first function to @a@; >> -- if it is @'Right' b@, apply the second function to @b...@. >> either :: (a -> c) -> (b -> c) -> Either a b -> c >> either f _ (Left x) = f x >> either _ g (Right y) = g y >> >> Dan >> >> Paul Sujkov wrote: >> >>> Hi Dan, >>> thank you for the solution. It looks pretty interesting and usable, >>> however I'll have to spend some time understanding arrows: I never had an >>> opportunity to use them before. Anyway, it looks very close to what I >>> actually need, and in any case much less ugly than breaking the GenParser >>> encapsulation >>> 2009/8/6 Dan Weston <[email protected] <mailto: >>> [email protected]>> >>> Of course, since ParsecT s u m is a functor, feel free to use fmap >>> instead of parsecMap. Then you don't need to import from >>> Text.Parsec.Prim. >>> And in hindsight, I might prefer the name (<:>) or cons to (<>) for >>> the first function, but now I'm just obsessing. :) >>> Dan >>> Dan Weston wrote: >>> I think parsecMap does the job here: >>> ----------------------- >>> import Text.ParserCombinators.Parsec hiding ((<|>)) >>> import Text.Parsec.Prim(parsecMap) >>> import Control.Applicative((<|>)) >>> import Control.Arrow((|||),(&&&)) >>> -- Tagged (:) >>> (<>) :: Either Char Char -> Either String String -> Either >>> String String >>> Left a <> Left b = Left (a:b) >>> Left a <> Right b = Left (a:b) >>> Right a <> Left b = Left (a:b) >>> Right a <> Right b = Right (a:b) >>> -- Tagged concat >>> stringParser :: [Either Char Char] -> Either String String >>> stringParser = foldr (<>) (Right "") >>> -- Parse Integer if properly tagged, keeping unparsed string >>> maybeToInteger :: Either String String -> (Maybe Integer, String) >>> maybeToInteger = (const Nothing ||| Just . read) &&& (id ||| id) >>> -- Tagged-choice parser >>> intOrStringParser = parsecMap (maybeToInteger . stringParser) >>> $ many1 (parsecMap Right digit <|> parsecMap Left (noneOf ";)")) >>> -- Parse between parentheses >>> intOrStringListParser = between (char '(') >>> (char ')') >>> (sepBy1 intOrStringParser (char >>> ';')) >>> ----------------------- >>> Then you get a tagged version of each string, along with the >>> string itself: >>> *P> parseTest intOrStringListParser $ "(1;2w4;8;85)" >>> [(Just 1,"1"),(Nothing,"2w4"),(Just 8,"8"),(Just 85,"85")] >>> There may be some parsecMap-fold fusion optimization possible, >>> though I haven't looked into that. >>> Dan >>> Paul Sujkov wrote: >>> Hi everybody, >>> suppose I have two different parsers: one just reads the >>> string, and another one parses some values from it. E.g.: >>> parseIntList :: Parser [Integer] >>> parseIntList = do >>> char '(' >>> res <- liftM (map read) (sepBy1 (many1 digit) (char ';')) >>> char ')' >>> return res >>> parseIntString :: Parser String >>> parseIntString = manyTill anyChar eof >>> so for some input like this - "(1;2;3;4)" - I will have two >>> different result: >>> *Parlog> parseTest parseIntList "(1;2;3;4)" >>> [1,2,3,4] >>> *Parlog> parseTest parseIntString "(1;2;3;4)" >>> "(1;2;3;4)" >>> but the thing that I actually want is something like Parser >>> ([Integer], String) - results from both parsers at a time, >>> no matter whether one of them fails or not: >>> *Parlog> parseTest parseIntListAndString "(1;2;3;4)" >>> ([1,2,3,4], "(1;2;3;4)") >>> it is impossible at first sight, because first parser to use >>> will consume all the input, and there will be nothing to >>> parse for the second one >>> Parsec contains "choice" function, but it is implemented via >>> <|> and that is mplus - so it tries second alternative only >>> if the first one fails. Is it possible to use two parsers >>> for the same string (with try-like backtracking, no input >>> actually consumed till the second parser finishes)? I can >>> assume only dirty hacks with the GenParser internals - >>> manual position storing and backtracking - but that is >>> obviously not good >>> however, my first attempt to solve the problem was kind a >>> like that: to parse string to String, and then to use it as >>> an input for the next level parse call: >>> parseIntListAndString :: Parser ([Integer], String) >>> parseIntListAndString = do >>> str <- parseIntString >>> return (res str, str) >>> where res str = case (parse parseIntList "" str) of >>> Left err -> [] >>> Right val -> val >>> but the problems with such a method began when I switched >>> from Parser to GenParser with user state: function >>> parseIntList have to update the state, but it can't have the >>> same state as the parseIntListAndString any more: it has >>> it's own. I can explicitly pass the state from >>> parseIntListAndString to parseIntList, but I see no suitable >>> way for the parseIntList to update it. I can return the >>> updated state value from the parseIntList function, and call >>> setState on a result - but it seems rather ugly to mee. >>> However, if nothing else will do, that is an alternative >>> it is of course possible to use two different parsers >>> sequentially, but it is also very ineffective: I need to use >>> such multiple parsing on a relatively small substring of the >>> actual input, so little backtracking would be a much nicier >>> approach. Any suggestions? >>> -- Regards, Paul Sujkov >>> -- >>> Regards, Paul Sujkov >>> >> >> _______________________________________________ >> Haskell-Cafe mailing list >> [email protected] >> http://www.haskell.org/mailman/listinfo/haskell-cafe >> > -- Regards, Paul Sujkov
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
