On Thursday, 22 October 2015 at 18:40:06 UTC, anonymous wrote:
On Thursday, October 22, 2015 08:10 PM, Nordlöw wrote:
How do I convert a `string` containing Unicode escape
sequences such as "\uXXXX" into UTF-8?
Ali explained that "\uXXXX" is already UTF-8.
But if you actually want to interpret such escape sequences
from user input or some such, then find all occurrences, and
for each of them do:
Yep, that's exactly what I want to do.
I want to use this to correctly decode DBpedia downloads since it
encodes it Unicode characters with these sequences.
* Drop the backslash and the 'u'.
* Parse XXXX as a hexadecimal integer, and cast to dchar.
* Use std.utf.encode to convert to UTF-8. std.conv.to can
probably do it
too, and possibly simpler, but would allocate.
Also be aware of the longer variant with a capital U:
\UXXXXXXXX (8 Xs)
Hmm, why isn't this already in Phobos?