On Thursday, 22 October 2015 at 18:40:06 UTC, anonymous wrote:
On Thursday, October 22, 2015 08:10 PM, Nordlöw wrote:

How do I convert a `string` containing Unicode escape sequences such as "\uXXXX" into UTF-8?

Ali explained that "\uXXXX" is already UTF-8.

But if you actually want to interpret such escape sequences from user input or some such, then find all occurrences, and for each of them do:

Yep, that's exactly what I want to do.

I want to use this to correctly decode DBpedia downloads since it encodes it Unicode characters with these sequences.

* Drop the backslash and the 'u'.
* Parse XXXX as a hexadecimal integer, and cast to dchar.
* Use std.utf.encode to convert to UTF-8. std.conv.to can probably do it
too, and possibly simpler, but would allocate.

Also be aware of the longer variant with a capital U: \UXXXXXXXX (8 Xs)

Hmm, why isn't this already in Phobos?

Reply via email to