As an example of what would need to be done by necessity for proper 
compliance with Unicode spec, check out the "Derived Property: Alphabetic" 
codepoint list section of this doc:

ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

"Total code points: 110943"

And that's just for the "is_alphabetic?" function! (Sure, this would be 
macroed out, but as Eric said, it would definitely increase the binary size 
further...)

I still think this is useful functionality (and would likely be many orders 
of magnitude faster than relying on Regex to determine these things due to 
Elixir/Erlang's fast function-head pattern matching)

--
Peter Marreck

On Tuesday, May 3, 2016 at 6:24:33 PM UTC-4, Eric Meadows-Jönsson wrote:
>
> The problem is that the Unicode module is already big, the file size of 
> the .beam file is one of the largest in elixir. There are also issues 
> compiling this file on systems with 512mb memory. idna, an erlang library 
> for unicode, have similar issues on systems with low memory. Adding more 
> functions that will need a large number of function clauses will make the 
> issue worse and the size of the compiled elixir we distribute larger.
>
> I think it's better to have this functionality in a library until we can 
> solve the memory issue and only have the bare necessities for unicode 
> support in stdlib. If we later can move it into stdlib it would be good to 
> have the API figured out and bugs fixed in another library that can iterate 
> faster.
>
> On Tue, May 3, 2016 at 11:29 PM, eksperimental <[email protected] 
> <javascript:>> wrote:
>
>> I'm not too sure if we should have all those many functions should be
>> added. it could be too many of them, and not easy to extend..
>> but how about an Unicode.info/1 function, that returns a tuple with
>> information about that character. such as
>> iex> Unicode.info("A")
>> ...> {:alphanumeric, :uppercase, :ascii}
>>
>> It will be easy to improve as we find more information can be added,
>> such as ISO types and other groups (Specially to encodings we are not
>> familiar with)
>>
>> Additionally we could have check?/2 (or some better name probably!)
>> iex> Unicode.check?("A", :uppercase)
>> ...> true
>> iex> Unicode.check?("A", :numeric)
>> ...> false
>>
>>
>> created, but On Tue, 3 May 2016 12:31:44 -0700 (PDT)
>> [email protected] <javascript:> wrote:
>>
>> > I have seen multiple people (In the Elixir Slack group
>> > <https://elixir-lang.slack.com/archives/general/p1462294660007855>,
>> > on Reddit
>> > <
>> https://www.reddit.com/r/elixir/comments/4h4y4e/whats_missing_from_the_elixir_ecosystem/d2nvbwd
>> >)
>> > during the last couple of days requiring something that checks if a
>> > (possibly long) string contains e.g. only alphanumeric characters.
>> >
>> > It is possible to do this using regular expressions right now:
>> > ~r/[^[:alnum:]]/u
>> >
>> > but this is very slow.
>> >
>> > My proposal is to add the following boolean functions to the String
>> > module:
>> >
>> >
>> >    -  alphabetic?
>> >    -  numeric?
>> >    -  alphanumeric?
>> >    -  whitespace?
>> >    -  uppercase?
>> >    -  lowercase?
>> >    -  control_character?
>> >
>> >
>> > Function heads for these functions can probably be best generated by
>> > using compile-time macros similar to what other unicode-based
>> > functions already use.
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/20160504042910.57fd86e0.eksperimental%40autistici.org
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Eric Meadows-Jönsson
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/77344881-53ce-4231-a59c-af331910a784%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to