Here's a handy simple function I've found very useful. You'll
obviously also need to import Debug.Trace:
pTrace s = pt <|> return ()
where pt = try $
do
x <- try $ many1 anyChar
trace (s++": " ++x) $ try $ char 'z'
fail x
It could perhaps be cleaner, but it does the job for me fine. Just
insert a line like pTrace "label" anywhere in your parsing functions
and whenever parsec hits that line you get a nice line of output:
"label: <rest of string to be parsed>" This tends to help track down
just where your code goes wrong. Try works like it should in my
experience, but that doesn't necessarily mean it works how you expect.
Regards,
s
On Jan 20, 2008, at 12:12 PM, Nicu Ionita wrote:
Hi,
I'm playing since a few hours with Parsec and trying to write a
small html
(fragment) parser, but I'm stuck in a point which I really can't
understand.
The problem seem to be either in "parseProperCont" or in
"closing" (see code
below). It looks like closing does not work (but it is a very simple
function!) or (also hard to believe) function "try" from Parsec has
some
problems.
Anyway I get this answer:
Prelude ParseSHtml> pf parseHtmlFrg "ptest.txt"
Left "ptest.txt" (line 5, column 2):
unexpected "/"
expecting element name
when I'm parsing this file:
<div id="normtext">
one line with break<br />
another line <br /><br />
Mail: <a href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</a>
</div>
with this code (sorry for the longer mail):
import Text.ParserCombinators.Parsec hiding (label)
import Text.XHtml.Strict
-- Helper function: parse a string up to one of the given chars
upTo :: [Char] -> Parser [Char]
upTo ds = many1 (noneOf ds)
parseHtmlFrg :: Parser Html
parseHtmlFrg = do many space
choice [parseElem, parseText]
<?> "html fragment"
parseElem :: Parser Html
parseElem = do en <- parseElTag
many1 space
(ats, cnt) <- restElem en
return $ compElem en cnt ! ats
<?> "html element"
-- Compose a html element from tag name and content
compElem en cnt = if isNoHtml cnt then itag en else tag en cnt
parseElTag :: Parser String
parseElTag = do char '<'
en <- elemName
return en
<?> "element tag"
elemName :: Parser String
elemName = many1 lower <?> "element name"
restElem :: String -> Parser ([HtmlAttr], Html)
restElem nm = do ats <- parseAttList
ht <- (restElNoCont <|> restElCont nm)
return (ats, ht)
<?> ("> or /> to close the tag " ++ nm)
-- Rest element with no content
restElNoCont = do char '/'
char '>'
return noHtml
<?> "/>"
-- Rest element with content
restElCont nm = do char '>'
many space
els <- parseProperCont nm
return $ concatHtml els
<?> "element with content"
-- Parse closing tag or proper content(s)
parseProperCont :: String -> Parser [Html]
parseProperCont nm = try (do closing nm
return []
)
<|> (do h <- parseHtmlFrg
hs <- parseProperCont nm
return (h:hs)
)
-- <|> return []
<?> "proper element content"
closing nm = do char '<'
char '/'
nm1 <- elemName
char '>'
if nm1 == nm
then return ()
else fail $ nm ++ ", encountered " ++ nm1
<?> ("closing of " ++ nm)
-- Parse a html attribute
parseAttr :: Parser HtmlAttr
parseAttr = do at <- many1 lower
char '='
va <- parseQuote
many space
return $ strAttr at va
<?> "Attribut"
parseAttList = many1 parseAttr <|> return [] <?> "attribute list"
-- Parse a quoted string
parseQuote :: Parser String
parseQuote = do char '"'
cs <- upTo ['"']
char '"'
return cs
-- Parse a text element
parseText :: Parser Html
parseText = do s <- upTo "<"
return (stringToHtml s)
<?> "some text"
-- For tests:
pf p file = parseFromFile p file
Nicu
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe