Checking the length of a String requires traversing the whole string. I fairly common use case is checking if the String is longer or shorter than a given value. I might not be interested if the string is 123 456 graphemes long or 654 321, if all I want to know if it is longer than 50 000 characters, but still has to calculate the full length to know if it is.
I'd propose adding String.longer? which calculates the length up to a given limit, and returns a boolean indicating if the string is longer than the limit. It could of course also be user to see if the string is shorter, longer-or-equal-to or shorter-or-equal-to: String.length(string) > 10 == String.longer?(string, 10) > String.length(string) >= 10 == String.longer?(string, 10 - 1) > String.length(string) <= 10 == !String.longer?(string, 10) > String.length(string) < 10 == !String.longer?(string, 10 - 1) I'm not sure about the naming of the function, please suggest a different one if you'd like. The proposal is implemented here: https://github.com/elixir-lang/elixir/compare/master...alvinlindstam:string-longer. I'd be happy to send a PR if the proposal is accepted. *Alternatives* There is nothing stopping a user from implementing this themselves, since next_grapheme_size is public, but I'd guess it's such a hassle that few would do it. There is no way to use this to check if a string's length is equal to a certain limit, or within a given range. We could use a more verbose api or more functions if that is desired. One alternative is to add an optional limit paramter to String.length, which always returns the string's length if below the limit, but returns the limit or some atom if it's longer. It would be slightly more verbose to check the length, but enables checks for a given value or range (while still preventing unnecessary calculations). *Benchmarks* With my implementation, I get the following output from a simple benchmark. String.longer? seems to be few percent slower than String.length for short strings when the length is not above the limit, but only grows linerarly up to the given limit. The tests are names bu function, string length and limit. Warning: The function you are trying to benchmark is super fast, making time measures unreliable! Benchee won't measure individual runs but rather run it a couple of times and report the average back. Measures will still be correct, but the overhead of running it n times goes into the measurement. Also statistical results aren't as good, as they are based on averages now. If possible, increase the input size so that an individual run takes more than 10μs Name ips average deviation median string.length, 10 1012798.94 0.99μs (±305.69%) 0.90μs string.longer?, 10, 10 1005594.61 0.99μs (±184.19%) 0.90μs string.longer?, 10000, 10 896158.77 1.12μs (±364.52%) 1.00μs string.longer?, 10000, 5000 2298.79 435.01μs (±44.71%) 390.00μs string.length, 10000 1241.32 805.59μs (±25.77%) 761.00μs Comparison: string.length, 10 1012798.94 string.longer?, 10, 10 1005594.61 - 1.01x slower string.longer?, 10000, 10 896158.77 - 1.13x slower string.longer?, 10000, 5000 2298.79 - 440.58x slower string.length, 10000 1241.32 - 815.90x slower *Further optimizations* I considered checking the byte_size of the string, hoping to find conditions when we could say for sure what the results would be. I planned to return false if the byte_size was below the limit, since that would mean that there are less codepoints than the limit. But I'm not sure there are no situations where a codepoint could produce more than one grapheme. I also planned to return true if byte_size was more than 4 times the limit, since each codepoint uses at most four bytes. But since a grapheme could use multiple codepoints it could also use more than four bytes, and I'm not sure what the upper limit is (if there is any). -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/8567063a-1ea3-420e-b8d8-dea15309a101%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
