[Haskell-cafe] Extracting structured data in XML into records

2007-02-24 Thread Johan Tibell

Hi!

I'm trying to extract HCards (http://microformats.org/wiki/hcard) from
HTML documents. HCard is a microformat. Microformats is an attempt to
add semantic information to XML documents without adding any new tags.
This is done by adding semantic information in class attributes
instead (see the 'testXml' string below).

I'm trying to find a good way to extract HCards into Haskell records.
To do this I need to map XML elements with certain attribute values
onto record fields. Some of the elements are optional in the XML and I
represent that using Maybe fields in my record. The order of the
elements is not guaranteed only the way they are nested. This makes it
more difficult to first extract the fields I want into a list of
Strings and then map that onto my record since I need to tag each
string with the value it represents.

So my question is. How can I write the function
'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool
for the job and I should be trying to walk the DOM tree instead?


module HCard where

import Text.XML.HXT.Arrow

data HCard = HCard
{
  familyName :: String,
  givenName :: String
  org :: Maybe String
  url :: Maybe String
} deriving Show

parseHCards xml = runX $ parseXml xml

parseXml xml =
readString [(a_parse_html, v_1)] xml 
deep (hasClassName vcard) 
extractElementsIntoRecords

extractElementsIntoRecords = undefined

hasClassName s = hasAttrValue class (elem s . words)

testXml =
div class=\vcard red\ ++
 a class=\url\ href=\http://tantek.com/\; ++
  span class=\n\ style=\display:none\ ++
   span class=\family-name\Çelik/span ++
   span class=\given-name\Tantek/span ++
  /span ++
  span class=\fn\Tantek Çelik/span ++
 /a ++
 div class=\org\ ++
  span class=\organization-name\Technorati/span ++
 /div ++
/div

main = parseHCards textXml


Cheers,

Johan Tibell
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Extracting structured data in XML into records

2007-02-24 Thread Daniel McAllansmith
On Saturday 24 February 2007 21:22, Johan Tibell wrote:
 So my question is. How can I write the function
 'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool
 for the job and I should be trying to walk the DOM tree instead?

  module HCard where
 
  import Text.XML.HXT.Arrow
 
  data HCard = HCard
  {
familyName :: String,
givenName :: String
org :: Maybe String
url :: Maybe String
  } deriving Show
 
  parseHCards xml = runX $ parseXml xml
 
  parseXml xml =
  readString [(a_parse_html, v_1)] xml 
  deep (hasClassName vcard) 
  extractElementsIntoRecords
 
  extractElementsIntoRecords = undefined

Perhaps something like the following (which is likely to be wrong seen I'm 
adlibing):

extractElementsIntoRecords = findFName + findGName + findOrg + findURL
where
findX c = deep (hasName span  hasAttrValue class (== c))  
getChildren  getText
findFName = findX family-name  arr Just
findGName = findX given-name  arr Just
findOrg   = (findX org  arr Just) `withDefault` Nothing
findURL   = (deep (hasName a  hasAttrValue class (== url))  
getAttrValue href  arr Just) `withDefault` Nothing

and use the following at an appropriate place:

composeHCard (Just fn:Just gn:morg:murl:xs) = (HCard fn gn morg murl):(compose 
xs)
composeHCard _ = []

There's several other possibilities for dealing with bad data and 
simplifications you could do of course.


Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] haskell-art mailing list

2007-02-24 Thread Henk-Jan van Tuyl


Is this something for the list at
  http://haskell.org/mailman/listinfo
?

(Maybe this page could be moved to haskellwiki?)

Met vriendelijke groet,
Henk-Jan van Tuyl


--
http://Van.Tuyl.eu/
--


On Sat, 24 Feb 2007 00:10:57 +0100, alex [EMAIL PROTECTED] wrote:



Hi all,

After a bit of discussion with Rohan Drape I've made a mailing list for
inclusive discussion of 'artistic' uses of Haskell.

The initial thought for it to be about Haskell sound and music in
particular, but we decided to broaden it to include visual, robotic and
related work as well.  [Right now I'm seeing Haskell itself and all code
written in it as high art, but I suppose this list is for discussion of
Haskell code that outputs art as well.]

If you'd like to join the list please visit this site:
  http://lists.lurk.org/mailman/listinfo/haskell-art

Let me know if you have any problems subscribing.

If it proves a popular and/or useful list, perhaps it could be moved to
haskell.org in the future.


alex

--
Using Opera's revolutionary e-mail client:
https://secure.bmtmicro.com/opera/buy-opera.html?AID=789433

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Illegal polymorphic or qualified type: forall l.

2007-02-24 Thread Bulat Ziganshin
Hello Marc,

Friday, February 23, 2007, 5:22:12 PM, you wrote:

 type ActionMonad a l = forall l. (HOccurs D1 l)
= ( ReaderT l IO a )

'l' should be either parameter of type constructor or forall'ed
variable. it seems that you try to set limitations on type constructor
parameter - thing that has another syntax and anyway not much support
in haskell'98. i suggest you to use smth like the following instead:

 type ActionMonad a l = ( ReaderT l IO a )

 instance (HOccurs D1 l) = Get CR (ActionMonad Bool ()) where
   get (CR a) = a

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

2007-02-24 Thread Melissa O'Neill
Someone asked if I'd include a classic C version of the Sieve in my  
comparisons.  Having done so, Lennart wrote (slightly rephrased):
How did you compare the C version with the Haskell versions? The  
Haskell programs produce the Nth prime, whereas the C code produces  
the last prime less than M.


True.  But since I have to know what M is to find the Nth prime, it's  
easy enough to ask the C code to produce the right prime.


To make the C code to what the Haskell code does you need to set  
some upper bound that is related to the prime number distribution.   
I see no trace of this in your code.


The Haskell versions that go up to a limit do this, so I could easily  
have written code to do it -- it's not hard, but has no real bearing  
on the time complexity of the code, so I didn't bother.


You could argue that it's cheating to tell it so blatantly when to  
stop, but I hate the C code I'd found enough that I didn't really  
want to touch it any more than I had to.



A much more legitimate complaint about the comparison with the C code  
is actually on space usage.  It uses much more space than some of the  
algorithms it's competing with.  More about that in an upcoming message.


Melissa.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe