We demonstrate that Haskell as it is, with no TH or other pre-processors, can rather concisely represent semi-structured documents and the rules of their processing. In short, Haskell can implement SXML (ssax.sourceforge.net), right in its syntax and with the *open* and extensible set of `tags'. The benefit of Haskell is of course in static type guarantees, such as prohibiting an H1 element to appear in the character content of other elements. It also seems that the specification of various pre-post-order and context-sensitive traversals is more concise in Haskell compared to Scheme. Again, we are talking about the existing Haskell, i.e., Haskell98 plus common extensions. No Template Haskell or even overlapping instances are required.
The features of our framework are: - extensibility: at any time the user can add new `tags' and new transformation/traversal modes. - static safety: there are no dynamics, and no variant data types for elements or attributes, and thus no possibility of a run-time pattern-match failure. - static guarantees: the framework statically ensures that a `block-level element' cannot appear in the inline (i.e., character) content and that a character-content entity cannot appear in a pure element content. Some entities (e.g., `title') may be either element or attribute. OTH, many other entities may not occur in the attribute context. Thus the generated XML or HTML document will not only be well-formed: it would also satisfy some validity constraints. More validity constraints can be added to the framework. - flexibility of the traversal: the same document can be transformed in pre-, post-, accumulating or other ways, even within the same session. A document can be processed in a pure function, or in a monadic action, including an IO action. In the latter case, we can, e.g., verify URLs as we generate an HTML document. A `tag' can be transformed depending of the transformation modes and tag's context: e.g., the same `(title "title")' expression may appear as an attribute or an element -- and it will be rendered differently by the (X)HMTL serializer. - pleasant syntax, which closely resembles SXML. We can write (p "string" "string1" br "string3") without unnecessary commas and other syntactic distractions. We take polyvariadic functions to the new level (pun intended). We also exploit functional dependencies on a per-instance level -- which is a very useful but, alas, rarely used technique. We introduce two frameworks for representing semi-structured data in Haskell. In one, semi-structured data is truly a heterogeneous structure -- and so can be showed, stored, written and read. In the second approach, semi-structured data are represented as a monadic value, polymorphic over the rendering monad. Different choices of the monad give different views of data. In GHC, this amounts to representing semi-structured data by linked dictionaries; in JHC, both approaches are probably equivalent. The first representation is quite reminiscent of HList; the second has clear SYB3 (``Scratch your boilerplate with class'') overtones, although the realization is quite different. In particular, there is no need for recursive instances. Because of the amount of code involved (to describe common HTML tags and their constraints), this message is not the complete code. The code is available from http://pobox.com/~oleg/ftp/Haskell/HSXML.tar.gz Please see the files HSXML.hs, HSXML_HTML.hs and the sample code sample1c.hs for the data-centric framework. The monad-centric framework is in one self-contained file CSXML.hs. The first framework is more in line with SXML, and so will be considered here. Our running example, inspired by the Haskell.org web site, is: test_haskell = (document (head [title "Haskell" longdash "HaskellWiki"] [meta_tag [description "All about the language" br "Haskell"]]) (body [h1 "Haskell"] [div (attr [title "titleline"]) [p [[a (attr [href (FileURL "/haskellwiki/Image:Haskelllogo.jpg")]) "Haskell" br "A <purely functional> language"]] br ] [p "Haskell is a general purpose," [[em [[strong "purely"]] "functional"]] "programming language"]])) We should point out the absence of commas. The [[x]] syntax can be replaced with a simple [x], but that would require overlapping instances. So far, we have avoided overlapping instances. Incidentally, in many Scheme systems (and, reputedly, in R6RS) square brackets are taken to be synonymous to the round ones. We see that the |br| can be used in various contexts: in the character content of an element and of an attribute (cf. `description' for the latter). However, if we try to replace "Haskell" within the `description' attribute with [[em "Haskell"]] we get an error that Couldn't match `CT_attr' against `CT_inline' Expected type: CT_attr Inferred type: CT_inline OTH, the string "Haskell" that appears within |h1| element may be replaced with [[em "Haskell"]]. However, if we try to enter (h1 [[h1 "Haskell"]]) we get type error Couldn't match `CT_inline' against `CT_block' Indeed, the element |H1| is not allowed in the `inline' context. We can transform the test_haskell data structure in many ways (e.g., extract all the titles, renumber sections, etc). We can also render it in HTML. The result is quite predictable: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <head> <title>Haskell— HaskellWiki</title> <meta name="description" content="All about the language Haskell"></meta> </head> <body> <h1>Haskell</h1> <div title="titleline"> <p><a href="/haskellwiki/Image:Haskelllogo.jpg">Haskell <br> A <purely functional> language</a> <br> </p> <p>Haskell is a general purpose,<em><strong>purely</strong>functional</em>programming language</p> </div> </body> We should note that |title| -- which can be either an element or an attribute -- is indeed rendered differently depending on the context. Just to emphasize the extensibility of the framework, we show how easy it is to add new elements. For example, the `tags' |longdash|, |a|, |div| and |title| are not defined in the base file HSXML.hs. We add these tags in sample1c.hs, as follows: Let us start with an abbreviation for the long dash. It may appear in the character content of an element or an attribute > data LongDash = LongDash deriving Show > longdash :: Check_ia LongDash ct => HW ct LongDash > longdash = HW LongDash > > -- and how to render it in HTML > instance RenderInline (HW ct LongDash) where > render_inline f _ = emit_lit "—" >> return f Actually, the latter instance describes rendering of longdash in any |MonadRender m| -- any monad that defines morphisms |emit|, |emit_elem| and |emit_attr|. Anchor is an inline element with an inline content > data Anchor a b = Anchor a b deriving Show > a attrs body = > build (as_inline . HW . Anchor (as_block attrs) . rev'apppend HNil) > nil_inline body > > instance (Render a, RenderInline b) =>RenderInline (HW ct (Anchor a b)) where > render_inline f (HW (Anchor attrs body)) = > emit_elem "a"  (Just (render attrs)) (render_ib body) > >> return False Title can be either - a block-level element whose content is CT_inline - an attribute (whose content is, therefore, CT_attr) > newtype Title a = Title a deriving Show > title x = build ((`as_ctx` co) . HW . Title . rev'apppend HNil) nil_ab x > where nil_ab = HW HNil `as_ctx` ci > (ci,co) = title_ctx > class Check_ia (Title ()) i => TitleCtx i o | i -> o, o -> i where > title_ctx :: (HW i a, HW o a) ; title_ctx = undefined > instance TitleCtx CT_attr CT_battr > instance TitleCtx CT_inline CT_block It can be rendered context-sensitively: > instance RenderInline a => Render (HW CT_battr (Title a)) where > render (HW (Title x)) = emit_attr "title" > ((render_inline False x) >> return ()) > instance RenderInline a => Render (HW CT_block (Title a)) where > render (HW (Title x)) = emit_elem "title" [Hint_nl] Nothing > (render_ib x) It seems many of the SXML transformations (cf. the `examples' directory of the SSAX distribution or CVS repository, ssax.sf.net) can be rendered in Haskell. More examples are forthcoming. _______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell