https://issues.dlang.org/show_bug.cgi?id=17861
Jonathan M Davis <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] | |m Hardware|x86 |All OS|Windows |All --- Comment #8 from Jonathan M Davis <[email protected]> --- This has been discussed before. There's a strong argument for making it so that decode uses the replacement character by default (it's even what the Unicode standard says you should do), and all string-based stuff then follows suit, at which point anyone wanting exceptions would need to call decode manually with the template argument indicating that that's what they wanted - which is the opposite of what we have now. And Walter is actually in favor of using the replacement character instead of exceptions and possibly even making the change in spite of the issues, but there have been some folks who have been strongly opposed to that. The problem is twofold: 1. Making the change risks silently breaking a ton of code. 2. Others (Vladimir in particular IIRC) have argued about how negative it is to have the contents of strings silently changed, since there are cases where it would be highly detrimental for that to happen. And on some level, all of this gets wrapped into the auto-decoding debate, because that's the main reason that this is out of the control of the user. front and popFront on strings call decode for you and call it in the way that results in exceptions on invalid UTF instead of using the replacement character. Anyone making the calls manually has the choice. So, I think that the chances are very high that we would go with the replacement character by default rather than exceptions (maybe not even have the exceptions at all) if we were starting from scratch - just like we wouldn't have auto-decoding if we were starting from scratch. But it's highly questionable that we can get away with making the change now due to the ramifications that it will have on existing code. At this point, the situation with decoding code points and not having it throw is in pretty much the same boat as using strings with range-based code and not auto-decoding: you have to use wrappers like byCodeUnit and/or special-case your code on strings. And to avoid the exceptions on bad Unicode, you either have to not be decoding code points, or you need to do so yourself with std.utf.decode. No, that's not ideal, but no one has been able to come up with a reasonable way to change the status quo with any kind of reasonable deprecation process. --
