Converting a string to a slice of runes gives you the individual code points, with the replacement character as necessary. Converting a slice of runes into a string gives you the UTF-8 representation. So sanitation of a string should be as simple as string([]rune(someString)). This will be O(n) and incur allocations. To and from []byte is another conversion and copy.
There may be a more efficient way directly on a byte slice. //jb On 22 Oct 2017, at 17:21, Juliusz Chroboczek <[email protected]<mailto:[email protected]>> wrote: I'm probably missing something obvious, but I've looked through the standard library to no avail. How do I sanitise a []byte to make sure it's a UTF-8 string by replacing all incorrect sequences by the replacement character (or whatever)? I've found unicode/utf8.Valid, which tells me if a []byte is a UTF-8 string, but I don't see a convenient function that I can use on the string before I pass it to the frontend that requires well-formed UTF-8. Thanks, -- Juliusz -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
