Re: Character (or byte?) escapes under utf8 pragma

2010-03-11 Thread Juerd Waalboer
an integer ordinal value. What happens is the following: 73 6f a0 65 69 6e a0 4b c3-a4 73 65 (UTF8 flag on) l1 l1 u8 This is wrong. It is a bug. -- Met vriendelijke groet, // Kind regards, // Korajn salutojn, Juerd Waalboer ju...@tnx.nl TNX

Re: Character (or byte?) escapes under utf8 pragma

2010-03-09 Thread Juerd Waalboer
with no Unicode significance. The documentation I referred to is outdated. Sorry for that. Indeed this documentation is wrong. Current documentation, as of Perl version 5.8.9 (december 2008) no longer has this paragraph. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer

Re: encode from_to error

2009-09-18 Thread Juerd Waalboer
groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker ##...@juerd.nl http://juerd.nl/sig Convolution: ICT solutions and consultancy sa...@convolution.nl

Re: Unicode characters

2009-05-25 Thread Juerd Waalboer
Andreas J. Koenig skribis 2009-05-25 8:30 (+0200): On Sun, 24 May 2009 10:09:25 +0200, Juerd Waalboer ju...@convolution.nl said: Although it's safe on output, it's better to get used to using :encoding(utf8) instead of :utf8. Using :utf8 on input can cause stability and security

Re: Unicode characters

2009-05-24 Thread Juerd Waalboer
regards, Korajn salutojn, Juerd Waalboer: Perl hacker ##...@juerd.nl http://juerd.nl/sig Convolution: ICT solutions and consultancy sa...@convolution.nl 1;

Re: /\w/ match with 'use locale' misses letters in utf8 locale

2008-07-11 Thread Juerd Waalboer
have no idea. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution: ICT solutions and consultancy [EMAIL PROTECTED] 1;

Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-12 Thread Juerd Waalboer
regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution: ICT solutions and consultancy [EMAIL PROTECTED]

Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-11 Thread Juerd Waalboer
have to differ on this :-) Yes, although my opinion on this is not strong. undef or replacement character - both are good options. One argument in favor of the replacement character would be backwards compatibility. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl

Re: Use of encoding/decoding and 3-param open

2007-11-15 Thread Juerd Waalboer
/?node_id=644786 For input, both get the correct characters, assuming the input bytestream was indeed correct. Yes, but if the bytestream is incorrect, you may have a security issue if you used :utf8 instead of :encoding. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer

Re: silent upgrading situations

2007-11-12 Thread Juerd Waalboer
to a float, whenever that is needed.) -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution: ICT solutions and consultancy [EMAIL PROTECTED]

Re: Explaining this behavior (was Re: good name for characters matching [^\0-\377]?)

2007-10-22 Thread Juerd Waalboer
, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution: ICT solutions and consultancy [EMAIL PROTECTED]

Re: Explaining this behavior (was Re: good name for characters matching [^\0-\377]?)

2007-10-19 Thread Juerd Waalboer
that is neither complete nor accurate, but it provides more information than most documentation does. Unfortunately I lack tuits to send bug reports and make patches. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig

Re: good name for characters matching [^\0-\377]?

2007-10-18 Thread Juerd Waalboer
Georg Bauhaus skribis 2007-10-18 17:01 (+0200): Isn't it about time to find a good name for crippled character sets with ordinals below 256 only? These are single byte encodings. I prefer to add the word legacy too. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer

Re: good name for characters matching [^\0-\377]?

2007-10-18 Thread Juerd Waalboer
proof) to work around this problem by using the Unicode::Semantics module's up() function, or the built-in utf8::upgrade(). -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution: ICT solutions

Re: de-utf8-ing a string

2007-10-17 Thread Juerd Waalboer
E R skribis 2007-10-17 15:56 (-0500): for (my $i = 0; $i length($x); $i++) { $new .= chr(ord(substr($x, $i, 1))); } utf8::downgrade(); -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution

Re: questions about encode/decode

2007-10-15 Thread Juerd Waalboer
of the bytestring is seen as a single ISO-8859-1 character, so a multi-byte UTF8 sequence will *not* be interpreted as a single character. Perhaps helpful: http://tnx.nl/perlunitut,perlunifaq -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED

Re: questions about encode/decode

2007-10-15 Thread Juerd Waalboer
, Korajn salutojn, Juerd Waalboer: Perl hacker [EMAIL PROTECTED] http://juerd.nl/sig Convolution: ICT solutions and consultancy [EMAIL PROTECTED]