Re: [go-nuts] Sanitising a UTF-8 string

Jakob Borg Sun, 22 Oct 2017 09:46:17 -0700

Converting a string to a slice of runes gives you the individual code points, 
with the replacement character as necessary. Converting a slice of runes into a 
string gives you the UTF-8 representation. So sanitation of a string should be 
as simple as string([]rune(someString)). This will be O(n) and incur 
allocations. To and from []byte is another conversion and copy.


There may be a more efficient way directly on a byte slice.

//jb

On 22 Oct 2017, at 17:21, Juliusz Chroboczek 
<[email protected]<mailto:[email protected]>> wrote:

I'm probably missing something obvious, but I've looked through the
standard library to no avail.  How do I sanitise a []byte to make sure
it's a UTF-8 string by replacing all incorrect sequences by the
replacement character (or whatever)?

I've found unicode/utf8.Valid, which tells me if a []byte is a UTF-8
string, but I don't see a convenient function that I can use on the string
before I pass it to the frontend that requires well-formed UTF-8.

Thanks,

-- Juliusz

--
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Sanitising a UTF-8 string

Reply via email to