Re: Converting Unicode Escape Sequences to UTF-8

Nordlöw via Digitalmars-d-learn Thu, 22 Oct 2015 12:15:47 -0700

On Thursday, 22 October 2015 at 18:40:06 UTC, anonymous wrote:

On Thursday, October 22, 2015 08:10 PM, Nordlöw wrote:
How do I convert a `string` containing Unicode escapesequences such as "\uXXXX" into UTF-8?
Ali explained that "\uXXXX" is already UTF-8.
But if you actually want to interpret such escape sequencesfrom user input or some such, then find all occurrences, andfor each of them do:


Yep, that's exactly what I want to do.

I want to use this to correctly decode DBpedia downloads since itencodes it Unicode characters with these sequences.

* Drop the backslash and the 'u'.
* Parse XXXX as a hexadecimal integer, and cast to dchar.
* Use std.utf.encode to convert to UTF-8. std.conv.to canprobably do it
too, and possibly simpler, but would allocate.
Also be aware of the longer variant with a capital U:\UXXXXXXXX (8 Xs)


Hmm, why isn't this already in Phobos?

Re: Converting Unicode Escape Sequences to UTF-8

Reply via email to