Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

Christian Theune Tue, 12 Jan 2016 11:11:28 -0800

Hi,

there are sane approaches to dealing with Strings (encoded) vs. Text (decoded) 
properly. We might not be able to do this at the moment, but I find Python 
(3)’s byte/text model quite sane.


It might be too much for us to support this with a quick fix, but we should 
keep that on the radar, I guess.

Christian

> On 12 Jan 2016, at 18:26, Jookia <[email protected]> wrote:
> 
> On Mon, Jan 11, 2016 at 11:29:37PM +0000, Erik Rybakken wrote:
>> Hi,
>> 
>> In nix, when finding the length of a string containing non-ascii characters,
>> the number of bytes in the representation is returned, instead of the actual
>> number of characters:
>> 
>>> nix-repl> builtins.stringLength "å"
>>> 2
>> 
>> Is there any way to get the number of characters instead, or does this
>> require changes in the core language?
> 
> It's probably best to leave it like it is now. A string's length is two if
> that's the number of bytes it uses. You'd have to start asking some hard
> questions if you want other behaviour like:
> 
> Why do you want the string's length? Do you want to truncate it? What if that
> creates an invalid sequence of characters somehow? Do you want to compare
> lengths or equality? Should text be normalized somehow? Which way?
> 
> What should the base 'unit' be for a string? A code point? A character? A
> glyph? A grapheme? How would this be implemented?
> 
>> Best Regards,
>> Erik Rybakken
> 
> Cheers,
> Jookia.
> _______________________________________________
> nix-dev mailing list
> [email protected]
> http://lists.science.uu.nl/mailman/listinfo/nix-dev

--
Christian Theune · [email protected] · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
nix-dev mailing list
[email protected]
http://lists.science.uu.nl/mailman/listinfo/nix-dev

Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

Reply via email to