[Haskell-cafe] Extracting structured data in XML into records
Hi! I'm trying to extract HCards (http://microformats.org/wiki/hcard) from HTML documents. HCard is a microformat. Microformats is an attempt to add semantic information to XML documents without adding any new tags. This is done by adding semantic information in class attributes instead (see the 'testXml' string below). I'm trying to find a good way to extract HCards into Haskell records. To do this I need to map XML elements with certain attribute values onto record fields. Some of the elements are optional in the XML and I represent that using Maybe fields in my record. The order of the elements is not guaranteed only the way they are nested. This makes it more difficult to first extract the fields I want into a list of Strings and then map that onto my record since I need to tag each string with the value it represents. So my question is. How can I write the function 'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool for the job and I should be trying to walk the DOM tree instead? module HCard where import Text.XML.HXT.Arrow data HCard = HCard { familyName :: String, givenName :: String org :: Maybe String url :: Maybe String } deriving Show parseHCards xml = runX $ parseXml xml parseXml xml = readString [(a_parse_html, v_1)] xml deep (hasClassName vcard) extractElementsIntoRecords extractElementsIntoRecords = undefined hasClassName s = hasAttrValue class (elem s . words) testXml = div class=\vcard red\ ++ a class=\url\ href=\http://tantek.com/\; ++ span class=\n\ style=\display:none\ ++ span class=\family-name\Çelik/span ++ span class=\given-name\Tantek/span ++ /span ++ span class=\fn\Tantek Çelik/span ++ /a ++ div class=\org\ ++ span class=\organization-name\Technorati/span ++ /div ++ /div main = parseHCards textXml Cheers, Johan Tibell ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Extracting structured data in XML into records
On Saturday 24 February 2007 21:22, Johan Tibell wrote: So my question is. How can I write the function 'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool for the job and I should be trying to walk the DOM tree instead? module HCard where import Text.XML.HXT.Arrow data HCard = HCard { familyName :: String, givenName :: String org :: Maybe String url :: Maybe String } deriving Show parseHCards xml = runX $ parseXml xml parseXml xml = readString [(a_parse_html, v_1)] xml deep (hasClassName vcard) extractElementsIntoRecords extractElementsIntoRecords = undefined Perhaps something like the following (which is likely to be wrong seen I'm adlibing): extractElementsIntoRecords = findFName + findGName + findOrg + findURL where findX c = deep (hasName span hasAttrValue class (== c)) getChildren getText findFName = findX family-name arr Just findGName = findX given-name arr Just findOrg = (findX org arr Just) `withDefault` Nothing findURL = (deep (hasName a hasAttrValue class (== url)) getAttrValue href arr Just) `withDefault` Nothing and use the following at an appropriate place: composeHCard (Just fn:Just gn:morg:murl:xs) = (HCard fn gn morg murl):(compose xs) composeHCard _ = [] There's several other possibilities for dealing with bad data and simplifications you could do of course. Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] haskell-art mailing list
Is this something for the list at http://haskell.org/mailman/listinfo ? (Maybe this page could be moved to haskellwiki?) Met vriendelijke groet, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ -- On Sat, 24 Feb 2007 00:10:57 +0100, alex [EMAIL PROTECTED] wrote: Hi all, After a bit of discussion with Rohan Drape I've made a mailing list for inclusive discussion of 'artistic' uses of Haskell. The initial thought for it to be about Haskell sound and music in particular, but we decided to broaden it to include visual, robotic and related work as well. [Right now I'm seeing Haskell itself and all code written in it as high art, but I suppose this list is for discussion of Haskell code that outputs art as well.] If you'd like to join the list please visit this site: http://lists.lurk.org/mailman/listinfo/haskell-art Let me know if you have any problems subscribing. If it proves a popular and/or useful list, perhaps it could be moved to haskell.org in the future. alex -- Using Opera's revolutionary e-mail client: https://secure.bmtmicro.com/opera/buy-opera.html?AID=789433 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Illegal polymorphic or qualified type: forall l.
Hello Marc, Friday, February 23, 2007, 5:22:12 PM, you wrote: type ActionMonad a l = forall l. (HOccurs D1 l) = ( ReaderT l IO a ) 'l' should be either parameter of type constructor or forall'ed variable. it seems that you try to set limitations on type constructor parameter - thing that has another syntax and anyway not much support in haskell'98. i suggest you to use smth like the following instead: type ActionMonad a l = ( ReaderT l IO a ) instance (HOccurs D1 l) = Get CR (ActionMonad Bool ()) where get (CR a) = a -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)
Someone asked if I'd include a classic C version of the Sieve in my comparisons. Having done so, Lennart wrote (slightly rephrased): How did you compare the C version with the Haskell versions? The Haskell programs produce the Nth prime, whereas the C code produces the last prime less than M. True. But since I have to know what M is to find the Nth prime, it's easy enough to ask the C code to produce the right prime. To make the C code to what the Haskell code does you need to set some upper bound that is related to the prime number distribution. I see no trace of this in your code. The Haskell versions that go up to a limit do this, so I could easily have written code to do it -- it's not hard, but has no real bearing on the time complexity of the code, so I didn't bother. You could argue that it's cheating to tell it so blatantly when to stop, but I hate the C code I'd found enough that I didn't really want to touch it any more than I had to. A much more legitimate complaint about the comparison with the C code is actually on space usage. It uses much more space than some of the algorithms it's competing with. More about that in an upcoming message. Melissa. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe