Alexey Khudyakov wrote:
But this bring question what "the right thing" is? If locale is UTF8 or system
support unicode some other way - no problem, just encode string properly.
Problem is how to deal with untanslatable characters. Skip? Replace with
question marks? Anything other? Probably we need to look how this is
solved in other languages. (Or not solved)

Regarding untranslatable characters, I think the only correct thing to do is consider it exceptional behavior and have the conversion function accept a handler function which takes the character as input and produces a string for it. That way programs can define their own behavior, since this is something that doesn't have a "right" way to recover in all cases. Canonical handlers which skip, replace with question marks (or other arbitrary character), throw actual exceptions, etc could be provided for convenience.

For stream-based "strings" a al ByteString, dealing with this sort of a handler in an efficient manner is fairly straightforward (though some CPS tricks may be needed to get rid of the Maybe in the result of the basic converter). For [Char] strings efficiency is harder, but the implementation should still be easy (given the basic converter).

Most extant languages I've seen tend to pick a single solution for all cases, but I don't think we should follow along that path.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to