>but a possible solution I
see would be to let ASSP internally use a fixed, known encoding and
using conversions for whatever "external" data so that the internal
representation of them will be consistent


Yes, that's the solution.The only usefull encoding is UTF-8, because this 
is the only encoding used by the regex/index/compare/sort/IO ....  engine 
of perl (except LATIN1).
Only Perl 5.14 is configurable to another encoding - but some Perl 
functions are still not Unicode conform in 5.14.

The "unicode_strings" feature was only a try to fix it all in an elegant 
and short way - btw. it works for the needs of ASSP with Perl 5.12 and it 
is automaticaly used by Perl if

use 5.12;

is defined - which prevents the script from running on earlyer Perl 
versions.

How ever -  as long as ASSP supports Perl 5.10 we'll need some code to 
correct the Perl 5.10 behavior.
At least, this 'workaround' technic has an advantage in relation to the 
internal Unicode usage. The script will run faster. Why? That's a long 
storry, but a small example will show it.

assume the simple regex  /\d+/
 
which searches for numbers - in our small known encoding world these are 
0....9 - in Unicode also many language specific characters are numbers. So 
perl has not only to search for ten characters, it has to search for 
hundreds of character combinations in all charsets used by humans.

>From the Perl unicode doc: http://perldoc.perl.org/perlunicode.html

As an example, the Unicode properties (character classes) like \p{Nd} are 
known to be quite a bit slower (5-20 times) than their simpler 
counterparts like \d (then again, there are hundreds of Unicode characters 
matching Nd compared with the 10 ASCII characters matching d )
 
Thomas


Von:    Grayhat <gray...@gmx.net>
An:     assp-test@lists.sourceforge.net
Datum:  14.02.2012 19:57
Betreff:        Re: [Assp-test] Antwort: Re: Antwort: Perl v5.14 
"unicode_strings" feature



> Early Perl versions has used ISO-8859-1 to store internal data - for 
> example as a result of a decode(...,...).

[...]

> the resulting string has a mixed enconding. Even if a string is
> internaly stored in  ISO-8859-1 - if we check the UTF-8 flag of the

Hmmm... maybe I'm missing some details here, but a possible solution I
see would be to let ASSP internally use a fixed, known encoding and
using conversions for whatever "external" data so that the internal
representation of them will be consistent; again, not sure it makes
sense or if I really got the point; if ASSP uses a given internal
encoding and if such encoding allows to deal with different languages
then... be it, just ensure to convert everything to such an encoding;
sure, writing regexp to match a given string may then become tricky,
but as long as it's documented ... :)

> You don't need to look for a solution Andrea - thank you. I know what
> to do - how ever the problem are not well encoded characters - the
> problem is the handling of wrong encoded characters - like every
> time, the exception handling.

Well, Thomas, just trying to help if and how much possible

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to