[elixir-core:7865] Proposal: String normalization in compatibility mode and transliteration

Nicolas Goy Mon, 12 Feb 2018 05:31:05 -0800

1.

String.normalize should support NFKC and NFKD unicode normalization format.


Reference: https://www.unicode.org/reports/tr15/

Those are particularly useful to generate "machine identifiers" from user 
input, like usernames.

2.

The second part (which is independent but related), is support for unicode 
transliteration.

Basically, this is a "non destructive" unicode->ascii conversion.

There is a library doing it in elixir
https://github.com/fcevado/unidecode 

and a javascript example 
https://github.com/pid/speakingurl

Also some discussion on the forum:
https://elixirforum.com/t/how-to-replace-accented-letters-with-ascii-letters/539/8

My thinking is that all those libraries are doing it a bit differently, 
because, well, unicode is hard.
And with unicode being so hard, I think it should be implemented at the 
language level (or in a core library) to be done right and supported.
It might not matters much for English readers, but for other languages, it 
is something you will implement eventually, often poorly.

Some references:
http://cldr.unicode.org/index/cldr-spec/transliteration-guidelines

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/d2839fb2-984c-4bcf-b8fd-c891c8c24c83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[elixir-core:7865] Proposal: String normalization in compatibility mode and transliteration

Reply via email to