apfelmus wrote: > Ah! I got struck by a trick: it's possible to store every tag name in > full only once but still present a simple tree with full tag names to > the user. You simply share all the tag names. For instance, in > > let x = "mytagname" in Tree (Tag x) [Tree (Tag x) [Text "foobar"]] > > the string "mytagname" is stored in memory only once although there are > two tags with this name. > > When parsing an XML-file, you look up every parsed tag name in a finite > map (i.e. the trie). If it's already in, you don't store the parsed tag > name in the XML tree but the one stored at the leaf in the trie. Of > course, these two strings are equal. But they're not (pointer-) > identical! After parsing, all equal tag names now are pointers to one > and the same leaf in the finite map. You can drop the finite map > structure afterwards, the pointers to the leaves will remain. > > That would be quite the memory reduction for large XML trees which are > likely to have many many identical tag names.
I've also thought about this approach. It sounds a bit weired, to built a map (or a trie) for the identity function. But it would solve a part of the space problem, at least when building XML documents by parsing. So I guess, there is a new project: A simple, small and lazy parser (no parsec), at least for the content part of XML. Uwe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe