Re: [rust-dev] How to find Unicode string length in rustlang

Nathan Myers Fri, 30 May 2014 05:13:24 -0700

A good name would be size(). That would avoid any confusion over variouslength definitions, and just indicate how much address space it occupies.


Nathan Myers


On May 29, 2014 8:11:47 PM Palmer Cox <palmer...@gmail.com> wrote:

Thinking about it more, units() is a bad name. I think a renaming could
make sense, but only if something better than len() can be found.

-Palmer Cox


On Thu, May 29, 2014 at 10:55 PM, Palmer Cox <palmer...@gmail.com> wrote:

> What about renaming len() to units()?
>
> I don't see len() as a problem, but maybe as a potential source of
> confusion. I also strongly believe that no one reads documentation if they
> *think* they understand what the code is doing. Different people will see
> len(), assume that it does whatever they want to do at the moment, and for
> a significant portion of strings that they encounter it will seem like
> their interpretation, whatever it is, is correct. So, why not rename len()
> to something like units()? Its more explicit with the value that its
> actually producing than len() and its not all that much longer to type. As
> stated, exactly what a string is varies greatly between languages, so, I
> don't think that lacking a function named len() is bad. Granted, I would
> expect that many people expect that a string will have method named len()
> (or length()) and when they don't find one, they will go to the
> documentation and find units(). I think this is a good thing since the
> documentation can then explain exactly what it does.
>
> I much prefer len() to byte_len(), though. byte_len() seems like a bit
> much to type and it seems like all the other methods on strings should then
> be renamed with the byte_ prefix which seems unpleasant.
>
> -Palmer Cox
>
>
> On Thu, May 29, 2014 at 3:39 AM, Masklinn <maskl...@masklinn.net> wrote:
>
>>
>> On 2014-05-29, at 08:37 , Aravinda VK <hallimanearav...@gmail.com> wrote:
>>
>> > I think returning length of string in bytes is just fine. Since I
>> didn't know about the availability of char_len in rust caused this
>> confusion.
>> >
>> > python 2.7 - Returns length of string in bytes, Python 3 returns number
>> of codepoints.
>>
>> Nope, depends on the string type *and* on compilation options.
>>
>> * Python 2's `str` and Python 3's `bytes` are byte sequences, their
>>  len() returns their byte counts.
>> * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code
>>  units count which may be UCS2 or UCS4 (depending whether the
>>  interpreter was compiled with `—enable-unicode=ucs2` — the default —
>>  or `—enable-unicode=ucs4`. Only the latter case is a true code points
>>  count.
>> * Python 3.3's `str` switched to the Flexible String Representation,
>>  the build-time option disappeared and len() always returns the number
>>  of codepoints.
>>
>> Note that in no case to len() operations take normalisation or visual
>> composition in account.
>>
>> > JS returns number of codepoints.
>>
>> JS returns the number of UCS2 code units, which is twice the number of
>> code points for those in astral planes.
>> _______________________________________________
>> Rust-dev mailing list
>> Rust-dev@mozilla.org
>> https://mail.mozilla.org/listinfo/rust-dev
>>
>
>




_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] How to find Unicode string length in rustlang

Reply via email to