[elixir-core:11548] [Proposal] U+FFFD Substitution of Maximal Subparts

Cameron Duley Thu, 05 Oct 2023 18:24:33 -0700

As far as I can tell, neither Elixir nor Erlang have a built in function 
for replacing invalid sequences in Unicode. There's a suggested method on 
this page 
<https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf#page=153>
 
of the Unicode standard for handling this. Several other languages (Go 
<https://pkg.go.dev/bytes#ToValidUTF8>, Python 
<https://docs.python.org/3/library/stdtypes.html#bytes.decode>, C# 
<https://github.com/dotnet/docs/issues/13547>, etc) now follow this spec.


Invalid Unicode's encountered frequently enough that I think it's worth 
incorporating a solution into Elixir itself. 

Present alternatives to handling invalid unicode (and json by extension 
<https://github.com/michalmuskala/jason/issues/174>) are:

   - Crashing (not ideal in many cases) 
   - Roll your own (lot of overhead for accidental complexity)
   - Depend on a package (+1 package towards dependency hell)

This is my college try <https://github.com/Moosieus/UniRecover/tree/main>, 
but I'm certain there's a performant and far cleaner solution to be had in 
pure Elixir. If not, perhaps this is a request for OTP.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/8b352173-d9d7-4490-843a-c365ba2f875an%40googlegroups.com.

[elixir-core:11548] [Proposal] U+FFFD Substitution of Maximal Subparts

Reply via email to