Hi, there are sane approaches to dealing with Strings (encoded) vs. Text (decoded) properly. We might not be able to do this at the moment, but I find Python (3)’s byte/text model quite sane.
It might be too much for us to support this with a quick fix, but we should keep that on the radar, I guess. Christian > On 12 Jan 2016, at 18:26, Jookia <[email protected]> wrote: > > On Mon, Jan 11, 2016 at 11:29:37PM +0000, Erik Rybakken wrote: >> Hi, >> >> In nix, when finding the length of a string containing non-ascii characters, >> the number of bytes in the representation is returned, instead of the actual >> number of characters: >> >>> nix-repl> builtins.stringLength "å" >>> 2 >> >> Is there any way to get the number of characters instead, or does this >> require changes in the core language? > > It's probably best to leave it like it is now. A string's length is two if > that's the number of bytes it uses. You'd have to start asking some hard > questions if you want other behaviour like: > > Why do you want the string's length? Do you want to truncate it? What if that > creates an invalid sequence of characters somehow? Do you want to compare > lengths or equality? Should text be normalized somehow? Which way? > > What should the base 'unit' be for a string? A code point? A character? A > glyph? A grapheme? How would this be implemented? > >> Best Regards, >> Erik Rybakken > > Cheers, > Jookia. > _______________________________________________ > nix-dev mailing list > [email protected] > http://lists.science.uu.nl/mailman/listinfo/nix-dev -- Christian Theune · [email protected] · +49 345 219401 0 Flying Circus Internet Operations GmbH · http://flyingcircus.io Forsterstraße 29 · 06112 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ nix-dev mailing list [email protected] http://lists.science.uu.nl/mailman/listinfo/nix-dev
