On Thu, Oct 09, 2014 at 06:04:02PM +0200, David Kastrup wrote: > What I am actually more interested in is in having libunistring offer > "roundtrippable" encodings as a fallback for decoding errors. > Basically, I want an option for decoding where libunistring announces > "what you have here is not valid utf-8 but I know how to deal with it". > Including reencoding. And delivering unique "character codes" and > string length calculations. The application would either keep track of > having received "dirty utf-8" and would reencode when putting out utf-8 > (where reencoding "internal utf-8" to "external utf-8" means replacing > the 2-byte sequences representing a wild byte by their original byte), > or it would reencode into "external" utf-8 when writing anyway which > would not change anything for originally valid utf-8.
It sounds like a reasonable philosophy to me. I don't think I'd want this to become the only option for libunistring, but if there's a practical way to add alternate interfaces, etc., then I think that would be valuable. (I am not the libunistring maintainer and don't intend to speak for him.)
