As an example of what would need to be done by necessity for proper compliance with Unicode spec, check out the "Derived Property: Alphabetic" codepoint list section of this doc:
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt "Total code points: 110943" And that's just for the "is_alphabetic?" function! (Sure, this would be macroed out, but as Eric said, it would definitely increase the binary size further...) I still think this is useful functionality (and would likely be many orders of magnitude faster than relying on Regex to determine these things due to Elixir/Erlang's fast function-head pattern matching) -- Peter Marreck On Tuesday, May 3, 2016 at 6:24:33 PM UTC-4, Eric Meadows-Jönsson wrote: > > The problem is that the Unicode module is already big, the file size of > the .beam file is one of the largest in elixir. There are also issues > compiling this file on systems with 512mb memory. idna, an erlang library > for unicode, have similar issues on systems with low memory. Adding more > functions that will need a large number of function clauses will make the > issue worse and the size of the compiled elixir we distribute larger. > > I think it's better to have this functionality in a library until we can > solve the memory issue and only have the bare necessities for unicode > support in stdlib. If we later can move it into stdlib it would be good to > have the API figured out and bugs fixed in another library that can iterate > faster. > > On Tue, May 3, 2016 at 11:29 PM, eksperimental <[email protected] > <javascript:>> wrote: > >> I'm not too sure if we should have all those many functions should be >> added. it could be too many of them, and not easy to extend.. >> but how about an Unicode.info/1 function, that returns a tuple with >> information about that character. such as >> iex> Unicode.info("A") >> ...> {:alphanumeric, :uppercase, :ascii} >> >> It will be easy to improve as we find more information can be added, >> such as ISO types and other groups (Specially to encodings we are not >> familiar with) >> >> Additionally we could have check?/2 (or some better name probably!) >> iex> Unicode.check?("A", :uppercase) >> ...> true >> iex> Unicode.check?("A", :numeric) >> ...> false >> >> >> created, but On Tue, 3 May 2016 12:31:44 -0700 (PDT) >> [email protected] <javascript:> wrote: >> >> > I have seen multiple people (In the Elixir Slack group >> > <https://elixir-lang.slack.com/archives/general/p1462294660007855>, >> > on Reddit >> > < >> https://www.reddit.com/r/elixir/comments/4h4y4e/whats_missing_from_the_elixir_ecosystem/d2nvbwd >> >) >> > during the last couple of days requiring something that checks if a >> > (possibly long) string contains e.g. only alphanumeric characters. >> > >> > It is possible to do this using regular expressions right now: >> > ~r/[^[:alnum:]]/u >> > >> > but this is very slow. >> > >> > My proposal is to add the following boolean functions to the String >> > module: >> > >> > >> > - alphabetic? >> > - numeric? >> > - alphanumeric? >> > - whitespace? >> > - uppercase? >> > - lowercase? >> > - control_character? >> > >> > >> > Function heads for these functions can probably be best generated by >> > using compile-time macros similar to what other unicode-based >> > functions already use. >> > >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/20160504042910.57fd86e0.eksperimental%40autistici.org >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Eric Meadows-Jönsson > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/77344881-53ce-4231-a59c-af331910a784%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
