* Carl Johnstone <catal...@fadetoblack.me.uk> [2009-11-23 18:50]: > Aristotle Pagaltzis wrote: > > Please plese don’t make statements like “not in this case” > > without knowing what the thing you are talking about does, > > i.e. in this case bytes::length, does. There are enough > > misconceptions about Unicode in Perl already. > > As far as the usage of bytes::length. Yes I agree with you that > the code is wrong as it's taking the byte length of perl's > internal representation - which happens to be utf-8 and whilst > correct in that case, isn't for any other character set and > shouldn't be relied upon.
No: the internal representation can be either of two formats, and which of the two you get is not reliable, because it’s purely an implementation detail. It’s never correct. It just accidentally works much of the time, getting the right answer by using the wrong method. > You *do* have to take a byte length of the string in the > destination character set though Yes. > so I'm interested in what the correct solution would be. Encode the string to the destination encoding (not just character set), so that the string represents an encoded octet stream, and then look at the plain old character length of that string. That will always give you the right answer, regardless of whether that string is packed bytes or variable-width integers. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/> _______________________________________________ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/