On 02/01/2012 09:44, max wrote:
I want to write a function whose behavior is as follows:

foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
"string2\r\nstring3", "string4"]

Note the sequence "\r\n", which is ignored. How can I do this?
Doing it probably the hard way (and getting it wrong) looks like the following...

--  Function to accept (normally) a single character. Special-cases
--  \r\n. Refuses to accept \n. Result is either an empty list, or
--  an (accepted, remaining) pair.
parseTok :: String -> [(String, String)]

parseTok "" = []
parseTok (c1:c2:cs) | ((c1 == '\r') && (c2 == '\n')) = [(c1:c2:[], cs)]
parseTok (c:cs)     | (c /= '\n')                    = [(c:[], cs)]
                    | True                           = []

--  Accept a sequence of those (mostly single) characters
parseItem :: String -> [(String, String)]

parseItem "" = [("","")]
parseItem cs = [(j1s ++ j2s, k2s)
                 | (j1s,k1s) <- parseTok  cs
                 , (j2s,k2s) <- parseItem k1s
               ]

--  Accept a whole list of strings
parseAll :: String -> [([String], String)]

parseAll [] = [([],"")]
parseAll cs = [(j1s:j2s,k2s)
                | (j1s,k1s) <- parseItem cs
                , (j2s,k2s) <- parseAll  k1s
              ]

--  Get the first valid result, which should have consumed the
--  whole string but this isn't checked. No check for existence either.
parse :: String -> [String]
parse cs = fst (head (parseAll cs))

I got it wrong in that this never consumes the \n between items, so it'll all go horribly wrong. There's a good chance there's a typo or two as well. The basic idea should be clear, though - maybe I should fix it but I've got some other things to do at the moment. Think of the \n as a separator, or as a prefix to every "item" but the first. Alternatively, treat it as a prefix to *every* item, and artificially add an initial one to the string in the top-level parse function. The use tail etc to remove that from the first item.

See http://channel9.msdn.com/Tags/haskell - there's a series of 13 videos by Dr. Erik Meijer. The eighth in the series covers this basic technique - it calls them monadic and uses the do notation and that confused me slightly at first, it's the *list* type which is monadic in this case and (as you can see) I prefer to use list comprehensions rather than do notation.

There may be a simpler way, though - there's still a fair bit of Haskell and its ecosystem I need to figure out. There's a tool called alex, for instance, but I've not used it.


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to