Quoth John Millikin <jmilli...@gmail.com>, > Ruby, which has an enormous Japanese userbase, solved the problem by > essentially defining Text = (Encoding, ByteString), and then > re-implementing text logic for each encoding. This allows very > efficient operation with every possible encoding, at the cost of > increased complexity (caching decoded characters, multi-byte handling, > etc).
Ruby actually comes from the CJK world in a way, doesn't it? Even if efficient per-encoding manipulation is a tough nut to crack, it at least avoids the fixed cost of bulk decoding, so an application designer doesn't need to think about the pay-off for a correct text approach vs. `binary'/ASCII, and the language/library designer doesn't need to think about whether genome data is a representative case etc. If Haskell had the development resources to make something like this work, would it actually take the form of a Haskell-level type like that - data Text = (Encoding, ByteString)? I mean, I know that's just a very clear and convenient way to express it for the purposes of the present discussion, and actual design is a little premature - ... but, I think you could argue that from the Haskell level, `Text' should be a single type, if the encoding differences aren't semantically interesting. Donn Cave, d...@avvanta.com _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe