[farsiweb]Perl 5.8: Now with better Unicode handling (fwd)

Roozbeh Pournader Mon, 22 Jul 2002 00:20:26 -0700

FYI


---------- Forwarded message ----------
Date: Sat, 20 Jul 2002 10:57:02 -0700
From: Paul Hoffman / IMC <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Perl 5.8: Now with better Unicode handling

Perl 5.8 was released yesterday. From the update description:

   New Unicode Semantics (no more `use utf8', almost)

     Previously in Perl 5.6 to use Unicode one would say "use utf8" and then
     the operations (like string concatenation) were Unicode-aware in that
     lexical scope.

     This was found to be an inconvenient interface, and in Perl 5.8 the
     Unicode model has completely changed: now the "Unicodeness" is bound to
     the data itself, and for most of the time "use utf8" is not needed at
     all. The only remaining use of "use utf8" is when the Perl script itself
     has been written in the UTF-8 encoding of Unicode. (UTF-8 has not been
     made the default since there are many Perl scripts out there that are
     using various national eight-bit character sets, which would be illegal
     in UTF-8.)

     See the perluniintro manpage for the explanation of the current model,
     and the utf8 manpage for the current use of the utf8 pragma.

   New Unicode Properties

     Unicode *scripts* are now supported. Scripts are similar to (and
     superior to) Unicode *blocks*. The difference between scripts and blocks
     is that scripts are the glyphs used by a language or a group of
     languages, while the blocks are more artificial groupings of (mostly)
     256 characters based on the Unicode numbering.

     In general, scripts are more inclusive, but not universally so. For
     example, while the script `Latin' includes all the Latin characters and
     their various diacritic-adorned versions, it does not include the
     various punctuation or digits (since they are not solely `Latin').

     A number of other properties are now supported, including `\p{L&}',
     `\p{Any}' `\p{Assigned}', `\p{Unassigned}', `\p{Blank}' [561] and
     `\p{SpacePerl}' [561] (along with their `\P{...}' versions, of course).
     See the perlunicode manpage for details, and more additions.

     The `In' or `Is' prefix to names used with the `\p{...}' and `\P{...}'
     are now almost always optional. The only exception is that a `In' prefix
     is required to signify a Unicode block when a block name conflicts with
     a script name. For example, `\p{Tibetan}' refers to the script, while
     `\p{InTibetan}' refers to the block. When there is no name conflict, you
     can omit the `In' from the block name (e.g. `\p{BraillePatterns}'), but
     to be safe, it's probably best to always use the `In').



--Paul Hoffman, Director
--Internet Mail Consortium

_______________________________________________
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb

[farsiweb]Perl 5.8: Now with better Unicode handling (fwd)

Reply via email to