As far as I can tell, neither Elixir nor Erlang have a built in function for replacing invalid sequences in Unicode. There's a suggested method on this page <https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf#page=153> of the Unicode standard for handling this. Several other languages (Go <https://pkg.go.dev/bytes#ToValidUTF8>, Python <https://docs.python.org/3/library/stdtypes.html#bytes.decode>, C# <https://github.com/dotnet/docs/issues/13547>, etc) now follow this spec.
Invalid Unicode's encountered frequently enough that I think it's worth incorporating a solution into Elixir itself. Present alternatives to handling invalid unicode (and json by extension <https://github.com/michalmuskala/jason/issues/174>) are: - Crashing (not ideal in many cases) - Roll your own (lot of overhead for accidental complexity) - Depend on a package (+1 package towards dependency hell) This is my college try <https://github.com/Moosieus/UniRecover/tree/main>, but I'm certain there's a performant and far cleaner solution to be had in pure Elixir. If not, perhaps this is a request for OTP. -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/8b352173-d9d7-4490-843a-c365ba2f875an%40googlegroups.com.