Re: [go-nuts] Sanitising a UTF-8 string

andrey mirtchovski Sun, 22 Oct 2017 08:57:07 -0700

See the section "For statements with range clause" in the spec:
https://golang.org/ref/spec#For_statements

"For a string value, the "range" clause iterates over the Unicode code
points in the string starting at byte index 0. On successive
iterations, the index value will be the index of the first byte of
successive UTF-8-encoded code points in the string, and the second
value, of type rune, will be the value of the corresponding code
point. If the iteration encounters an invalid UTF-8 sequence, the
second value will be 0xFFFD, the Unicode replacement character, and
the next iteration will advance a single byte in the string."

On Sun, Oct 22, 2017 at 8:29 AM, Juliusz Chroboczek <[email protected]> wrote:
> I'm probably missing something obvious, but I've looked through the
> standard library to no avail.  How do I sanitise a []byte to make sure
> it's a UTF-8 string by replacing all incorrect sequences by the
> replacement character (or whatever)?
>
> I've found unicode/utf8.Valid, which tells me if a []byte is a UTF-8
> string, but I don't see a convenient function that I can use on the string
> before I pass it to the frontend that requires well-formed UTF-8.
>
> Thanks,
>
> -- Juliusz
>
> --
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Sanitising a UTF-8 string

Reply via email to