On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer
wrote:
On 5/26/17 11:20 AM, John Colvin wrote:
On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
[...]
This version also has the advantage of being (discounting any
bugs in
iopipe) correct for arbitrary unicode in all common UTF
encodings.
I worked a lot on making sure this works properly. However,
it's possible that there are some lingering issues.
I also did not spend much time optimizing these paths (whereas
I spent a ton of time getting the utf8 line parsing as fast as
it could be). Partly because finding things other than utf8 in
the wild is rare, and partly because I have nothing to compare
it with to know what is possible :)
-Steve
If you want UCS-2 (aka UTF-16 without surrogates) data I can give
you gigabytes of files in tmx format.