Quoth John Millikin <jmilli...@gmail.com>,

> Ruby, which has an enormous Japanese userbase, solved the problem by
> essentially defining Text = (Encoding, ByteString), and then
> re-implementing text logic for each encoding. This allows very
> efficient operation with every possible encoding, at the cost of
> increased complexity (caching decoded characters, multi-byte handling,
> etc).

Ruby actually comes from the CJK world in a way, doesn't it?

Even if efficient per-encoding manipulation is a tough nut to crack,
it at least avoids the fixed cost of bulk decoding, so an application
designer doesn't need to  think about the pay-off for a correct text
approach vs. `binary'/ASCII, and the language/library designer doesn't
need to think about whether genome data is a representative case etc.

If Haskell had the development resources to make something like this
work, would it actually take the form of a Haskell-level type like
that - data Text = (Encoding, ByteString)?  I mean, I know that's
just a very clear and convenient way to express it for the purposes
of the present discussion, and actual design is a little premature -
... but, I think you could argue that from the Haskell level,
`Text' should be a single type, if the encoding differences aren't
semantically interesting.

        Donn Cave, d...@avvanta.com
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to