Hi!

I'm trying to extract HCards (http://microformats.org/wiki/hcard) from
HTML documents. HCard is a microformat. Microformats is an attempt to
add semantic information to XML documents without adding any new tags.
This is done by adding semantic information in class attributes
instead (see the 'testXml' string below).

I'm trying to find a good way to extract HCards into Haskell records.
To do this I need to map XML elements with certain attribute values
onto record fields. Some of the elements are optional in the XML and I
represent that using Maybe fields in my record. The order of the
elements is not guaranteed only the way they are nested. This makes it
more difficult to first extract the fields I want into a list of
Strings and then map that onto my record since I need to tag each
string with the value it represents.

So my question is. How can I write the function
'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool
for the job and I should be trying to walk the DOM tree instead?

module HCard where

import Text.XML.HXT.Arrow

data HCard = HCard
    {
      familyName :: String,
      givenName :: String
      org :: Maybe String
      url :: Maybe String
    } deriving Show

parseHCards xml = runX $ parseXml xml

parseXml xml =
    readString [(a_parse_html, v_1)] xml >>>
    deep (hasClassName "vcard") >>>
    extractElementsIntoRecords

extractElementsIntoRecords = undefined

hasClassName s = hasAttrValue "class" (elem s . words)

testXml =
    "<div class=\"vcard red\">" ++
    " <a class=\"url\" href=\"http://tantek.com/\";>" ++
    "  <span class=\"n\" style=\"display:none\">" ++
    "   <span class=\"family-name\">Çelik</span>" ++
    "   <span class=\"given-name\">Tantek</span>" ++
    "  </span>" ++
    "  <span class=\"fn\">Tantek Çelik</span>" ++
    " </a>" ++
    " <div class=\"org\">" ++
    "  <span class=\"organization-name\">Technorati</span>" ++
    " </div>" ++
    "</div>"

main = parseHCards textXml

Cheers,

Johan Tibell
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to