Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-30 Thread Nathan Myers
A good name would be size(). That would avoid any confusion over various length definitions, and just indicate how much address space it occupies. Nathan Myers On May 29, 2014 8:11:47 PM Palmer Cox palmer...@gmail.com wrote: Thinking about it more, units() is a bad name. I think a renaming

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-30 Thread Matthieu Monrocq
Except that in C++ std::basic_string::size and std::basic_string:length are synonymous (both return the number of CharTs, which in std::string is also the number of bytes). Thus I am unsure whether this would end up helping C++ developers. Might help others though. On Fri, May 30, 2014 at 2:12

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-30 Thread Kevin Ballard
This is a very long bikeshed for something which there's no evidence is even a problem. I propose that we terminate this thread now. If you believe that .len() needs to be renamed, please go gather evidence that's compelling enough to warrant breaking tradition with practically every

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-29 Thread Bardur Arantsson
On 2014-05-29 07:47, Kevin Ballard wrote: The JavaScript version is quite wrong. Isaac points out that NFC vs NFD can change the result, although that's really an issue with grapheme clusters vs codepoints. More interestingly, JavaScript's idea of string length is wrong for anything outside

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-29 Thread Kevin Ballard
On May 28, 2014, at 11:37 PM, Aravinda VK hallimanearav...@gmail.com wrote: I wonder if chars() available for String itself, so that we can avoid running as_slice().chars() This is a temporary issue. Once DST lands we will likely implement Derefstr for String, which will make all str methods

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-29 Thread Simon Sapin
On 29/05/2014 08:25, Kevin Ballard wrote: This is a temporary issue. Once DST lands we will likely implement Derefstr for String, which will make all str methods work transparently on String. Until then, is there a reason not to have String implement the StrSlice trait? -- Simon Sapin

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-29 Thread Masklinn
On 2014-05-29, at 08:37 , Aravinda VK hallimanearav...@gmail.com wrote: I think returning length of string in bytes is just fine. Since I didn't know about the availability of char_len in rust caused this confusion. python 2.7 - Returns length of string in bytes, Python 3 returns number of

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-29 Thread Palmer Cox
What about renaming len() to units()? I don't see len() as a problem, but maybe as a potential source of confusion. I also strongly believe that no one reads documentation if they *think* they understand what the code is doing. Different people will see len(), assume that it does whatever they

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-29 Thread Palmer Cox
Thinking about it more, units() is a bad name. I think a renaming could make sense, but only if something better than len() can be found. -Palmer Cox On Thu, May 29, 2014 at 10:55 PM, Palmer Cox palmer...@gmail.com wrote: What about renaming len() to units()? I don't see len() as a problem,

[rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Aravinda VK
Hi, How to find number of characters in a string? Following example returns byte count instead of number of characters. use std::string::String; fn main() { let unicode_str = String::from_str(ಅ); let ascii_str = String::from_str(a); println!(unicode str: {},

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Masklinn
On 2014-05-28, at 11:10 , Aravinda VK hallimanearav...@gmail.com wrote: Hi, How to find number of characters in a string? Problem 1: define character. Do you mean a glyph? A grapheme cluster? A code point? Composed or decomposed? Problem 2: what use is knowing the length of a string?

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Aravinda VK
Thanks. I didn't know about char_len. `unicode_str.as_slice().char_len()` is giving number of code points. Sorry for the confusion, I was referring codepoint as character in my mail. char_len gives the correct output for my requirement. I have written javascript script to convert from string

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Benjamin Striegel
I think that the naming of `len` here is dangerously misleading. Naive ASCII-users will be free to assume that this is counting codepoints rather than bytes. I'd prefer the name `byte_len` in order to make the behavior here explicit. On Wed, May 28, 2014 at 5:55 AM, Simon Sapin

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Daniel Micay
On 28/05/14 10:07 AM, Benjamin Striegel wrote: I think that the naming of `len` here is dangerously misleading. Naive ASCII-users will be free to assume that this is counting codepoints rather than bytes. I'd prefer the name `byte_len` in order to make the behavior here explicit. It doesn't

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Simon Sapin
On 28/05/2014 15:13, Daniel Micay wrote: On 28/05/14 10:07 AM, Benjamin Striegel wrote: I think that the naming of `len` here is dangerously misleading. Naive ASCII-users will be free to assume that this is counting codepoints rather than bytes. I'd prefer the name `byte_len` in order to make

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
It's .len() because slicing and other related functions work on byte indexes. We've had this discussion before in the past. People expect there to be a .len(), and the only sensible .len() is byte length (because char length is not O(1) and not appropriate for use with most string-manipulation

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Benjamin Striegel
People expect there to be a .len() This is the assumption that I object to. People expect there to be a .len() because strings have been fundamentally broken since time immemorial. Make people type .byte_len() and be explicit about their desire to index via code units. On Wed, May 28, 2014 at

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
Breaking with established convention is a dangerous thing to do. Being too opinionated (regarding opinions that deviate from the norm) tends to put people off the language unless there's a clear benefit to forcing the alternative behavior. In this case, there's no compelling benefit to naming

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Benjamin Striegel
Being too opinionated (regarding opinions that deviate from the norm) tends to put people off the language unless there's a clear benefit to forcing the alternative behavior. We have already chosen to be opinionated by enforcing UTF-8 in our strings. This is an extension of that break with

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Thad Guidry
Benjamin seems to say that folks won't read the docs and we need to make the syntax more helpful.. Kevin seems to say that we need to keep the syntax simple and just teach folks to read the docs. I think I would agree with both of them overall for a language design goal that Rust wants to

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
On May 28, 2014, at 11:55 AM, Benjamin Striegel ben.strie...@gmail.com wrote: Being too opinionated (regarding opinions that deviate from the norm) tends to put people off the language unless there's a clear benefit to forcing the alternative behavior. We have already chosen to be

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Benjamin Striegel
There's no clear tradition regarding strings. Excellent, then surely nobody has any right to expect a method named .len() :) Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple concept. I don't think we can fully divorce these two ideas. Understanding UTF-8 still

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
On May 28, 2014, at 1:26 PM, Benjamin Striegel ben.strie...@gmail.com wrote: Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple concept. I don't think we can fully divorce these two ideas. Understanding UTF-8 still implies understanding the difference between

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Benjamin Striegel
Do you honestly believe Yes. Anyone who comes to Rust expecting there to be a .len() method on strings has demonstrated that they fundamentally misunderstand what strings are. Correcting them will be a learning experience, to their benefit. more verbose, annoying, unconventional names I

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Huon Wilson
On 29/05/14 06:38, Kevin Ballard wrote: On May 28, 2014, at 1:26 PM, Benjamin Striegel ben.strie...@gmail.com mailto:ben.strie...@gmail.com wrote: Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple concept. I don't think we can fully divorce these two ideas.

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
On May 28, 2014, at 3:24 PM, Huon Wilson dbau...@gmail.com wrote: Changing the names of methods on strings seems very similar how Path does not implement Show (except with even stronger motivation, because strings have at least 3 sensible interpretations of what the length could be). I

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Benjamin Striegel
Oh and while we're belligerently bikeshedding, we should rename `to_str` to `to_string` once we rename `StrBuf` to `String`. :) On Wed, May 28, 2014 at 9:00 PM, Benjamin Striegel ben.strie...@gmail.comwrote: but people will still end up calling the *exact same method* ...Except when they

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
On May 28, 2014, at 6:00 PM, Benjamin Striegel ben.strie...@gmail.com wrote: To reiterate, it simply doesn't make sense to ask what the length of a string is. You may as well ask what color the string is, or where the string went to high school, or how many times the string rode the roller

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Bardur Arantsson
On 2014-05-29 05:36, Kevin Ballard wrote: [--snip--] And when dealing with a sequence in a precise encoding, the natural unit to work with is the code unit (and this has precedence in other languages, such as JavaScript, Obj-C, and Go). JavaScript: $ node var s = hï; // Note the

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Isaac Dupree
Hi all, I don't suggest seeing Javascript as a great example for Rust. It uses UTF-16, but was created back when UTF-16 was UCS-2, so two-code-unit codepoints are poorly supported in Javascript (e.g. you can't use them in regex character classes). On 05/29/2014 12:16 AM, Bardur Arantsson

Re: [rust-dev] How to find Unicode string length in rustlang

2014-05-28 Thread Kevin Ballard
On May 28, 2014, at 9:16 PM, Bardur Arantsson s...@scientician.net wrote: Rust: $ cat fn main() { let l = hï.len(); // Note the accent println!({:u}, l); } $ rustc hello.rs $ ./hello 3 No matter how defective the notion of length may be, personally I think that