Hi all, On Sat, Mar 24, 2012 at 12:39 AM, Heinrich Apfelmus <apfel...@quantentunnel.de> wrote: > Which brings me to the fundamental question behind this proposal: Why do we > need Text at all? What are its virtues and how do they compare? What is the > trade-off? (I'm not familiar enough with the Text library to answer these.) > > To put it very pointedly: is a %20 performance increase on the current > generation of computers worth the cost in terms of ease-of-use, when the > performance can equally be gained by buying a faster computer or more RAM? > I'm not sure whether I even agree with this statement, but this is the > trade-off we are deciding on.
Correctness ========== Using list-based operations on Strings are almost always wrong, as soon as you move away from English text. You almost always have to deal with Unicode strings as blobs, considering several code points at once. For example, upcase :: String -> String upcase = map toUpper Is terse, beautiful, and wrong, as several languages map a single lowercase character to two uppercase characters (as I'm sure you're aware.) Perhaps this is OK to ignore when teaching students Haskell, but it really hurts those who want to use Haskell as an engineering language. Performance =========== Depending on the benchmark, the difference can be much bigger than 20%. For example, here's a comparison of decoding UTF-8 byte data into a String vs a Text value: benchmarking Pure/decode/Text mean: 50.22202 us, lb 50.08306 us, ub 50.37669 us, ci 0.950 std dev: 751.1139 ns, lb 666.2243 ns, ub 865.8246 ns, ci 0.950 variance introduced by outliers: 7.553% variance is slightly inflated by outliers benchmarking Pure/decode/String mean: 188.0507 us, lb 187.4970 us, ub 188.6955 us, ci 0.950 std dev: 3.053076 us, lb 2.647318 us, ub 3.606262 us, ci 0.950 variance introduced by outliers: 9.407% variance is slightly inflated by outliers A difference of almost 4x. Many of the Text vs String benchmarks measure the performance of operations ignoring both decoding and encoding, while any real application would have to do both. On top of that, String is more or less as optimized as it can be; benchmarks are almost completely memory bound. Text on the other hand still has potential of (large) improvements, as GHC doesn't general optimal code for tight loops over arrays. For example, we know that GHC generates bad code for decodeUtf8 as used by Text's stream fusion, hurting any code that uses fusion. Furthermore, the memory overhead of Text is smaller, which means that applications that hold on to many string value will use less heap and thus experience smaller "freezes" due major GC collections, which are linear in the heap size. Cheers, Johan _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime