Rene de Visser wrote: > I think a step towards support medium size documents in HXT would be to > store the tags and content more efficiently. > If I undertand the coding correctly every tag is stored as a seperate > Haskell string. As each byte of a string under GHC takes 12 bytes this alone > leads to high memory usage. Tags tend to repeat. You could store them > uniquely using a hash table. Content could be stored in compressed byte > strings.
Yes, storing element and attribute names in a packed format, something similar to ByteString but for unicode values, would reduce the amount of storage. A perhaps small shortcomming of that aproach are the conversions between String and the internal representation when processing names. The hashtable approach would of course reduce memory usage, but this would require a global change of the processing model: A document then does not longer consist of a single tree, it alway consists of a pair of a tree and a map. By the way, the amount of memory used for strings ([Char] values) in Haskell is a problem for ALL text processing tasks. Its not limited HXT, nor is it special to XML. For me the efficieny problems with strings as list of chars and a possible solution by e.g. implementing String data transparenty more efficent than other lists is an issue for Haskell' (or Haskell'') and/or it's a challage for the language implementors. Uwe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe