On Thu, May 27, 2010 at 10:53 AM, Michael Snoyman <mich...@snoyman.com>wrote:
> In other words, here's what I think the three different benchmarks are > really doing: > > * String: generates a list of Strings, passes each String to a relatively > inefficient IO routine. > * ByteString: encodes Strings one by one into ByteStrings, generates a list > of these ByteStrings, and passes each ByteString to a very efficient IO > routine. > : Text: encodes Strings one by one into Texts, generates a list of these > Texts, calls a UTF-8 decoding function to decode each Text into a > ByteString, and passes each resulting ByteString to a very efficient IO > routine. > If Text used UTF-8 internally rather than UTF-16 we could create Texts from string literals much more efficiently, in the same manner as done in Char8.pack for bytestrings: {-# RULES "FPS pack/packAddress" forall s . pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s) #-} This rule skips the creation of an intermediate String when packing a string literal by having the created ByteString point directly to the memory GHC allocates (outside the heap) for the string literal. This rule could be added directly to a builder monoid for lazy Texts so that no copying is done at all. In addition, if Text was internally represented using UTF-8 encodeUtf8 would be free. Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe