https://issues.dlang.org/show_bug.cgi?id=20134
Jon Degenhardt <jrdemail2000-dl...@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jrdemail2000-dl...@yahoo.co | |m --- Comment #5 from Jon Degenhardt <jrdemail2000-dl...@yahoo.com> --- Correct handling of invalid UTF sequences is often known only by the application. That is, it is task dependent. And in some applications, the appropriate handling may not be known until runtime, making compile-time decisions problematic. A related piece of the puzzle is that in many high performance string processing applications, it is useful to switch between modes of processing where strings are handled as bytes for some algorithms, then switch back to modes where strings are character sequences. When operating as bytes, UTF interpretation is not needed or desired (so no detection of invalid UTF sequences). But when algorithms are operating on characters, then invalid UTF detection/handling is desired/required. (Note: Many of these algorithms are possible because ASCII characters in UTF-8 can be used as single byte markers without interpretation of other parts of the byte stream.) This makes it difficult for libraries to implement a single policy and still nicely support the wide range of application use-cases. Especially when there may be many layers of code between the application layer making a call and the lower level function where opportunity for detection occurs. As an application developer, what I'd really like to have is a magical context object where the current detection and handling policies are set, and have all code invoked with the scope of that object obey them. I'd gladly take a performance hit to get it. This may too big change, but it's worth considering how well other solutions compare from an application development perspective. --