Hi Alex,

You're not the only guy with an accented character in their name ;).

Your problem is not a bug within the Zend Framework (except that it's overrun 
with Anglic overlords ;)), rather it's that you are using Validator rules 
unsafe for accented/multibyte characters without some changes. Personally I 
wish someone would put a warning sticker on those String rules since they are 
not safe to use for internationalised applications and should be avoided. It's 
a weak spot in the framework since strlen() in PHP does not in fact count 
characters (it counts bytes!).

Firstly, Zend_Validate_Alpha places a call to ctype_alpha(). This function is 
likely not reading accented characters as being valid alphabetic characters by 
default. This is because your currently set Locale doesn't understand them or 
they are encoded in a non-default character set. For example, an accented 
character encoded in UTF-8, passed into ctype_alpha() with the correct locale 
would still fail until the input string is converted to something like 
ISO-8849-1 (or 15 these days). Check how many bytes any accented character in 
UTF-8 contains ;).

At this stage I usually say screw ctype_alpha() and use something where the 
input is more comparable - like preg_match(). So long as the file containing 
the character range is stored as UTF-8 it works perfectly well. So in your case 
you can write a new Validator using something like:

    public function isValid($value)
    {
        $this->_messages = array();

        $valueString = (string) $value;

        /**
         * German Alpha range requiring a minimum length of 1 (empty string 
will fail)
         * NB: This PHP file must be saved as UTF-8 only.
         */
        if (!preg_match("/^[a-zA-ZäÄéÉÖöÜüß]{1,}$/", $valueString)) {
            $this->_messages[] = "'$valueString' has not only Deutsche 
alphabetic characters";
            return false;
        }

        return true;
    }

The file must be stored as UTF-8 (not any ISO-8849-1 to 15). Also any input 
being validated must likewise be UTF-8 (or validated as UTF-8), i.e. web pages 
should be server with this header:
Content-Type: text/html;charset=UTF-8 or an equivalent meta tag in the HTML 
head section so forms are sent as UTF-8. You can also add the 
charset-accept="utf-8" attribute to your forms (in case some browsers need the 
extra hint).

Alternatively the character range can be specified using the PCRE \x{????} 
format, where ???? is the Unicode "code point" (hexidecimal) for that 
character. This one needs the \u flag at the end of the pattern to enable 
PCRE's unicode support. This is probably the better solution if possible - the 
PHP file encoding then becomes less essential since multibyte characters are 
now represented as ranges/points of Unicode in terms of hexidecimal.

Secondly, the StringLength validator uses strlen(). This counts bytes, not 
characters. If you write a string like "groß", you have 4 characters. But 
strlen() will tell you there are 5 bytes (so standard strlen is useless for 
counting characters including anything with accents under Unicode). If you have 
such characters (including Unicode accented characters) then you need to use 
either mbstring or iconv to write a new validator. For example, here's an 
extract of one based on mbstring (assuming UTF-8 settings).

// Zps_Validate_StringLength extends Zend_Validate_StringLength

    public function isValid($value)
    {
        $this->_messages = array();
        $valueString = (string) $value;
        /**
         * Can omit the encoding reference if you prefer setting it in php.ini 
or using
         * mb_internal_encoding() in the bootstrap.
         */
        $length = mb_strlen($valueString, 'UTF-8');
        if ($length < $this->_min) {
            $this->_messages[] = "'$valueString' is less than $this->_min 
characters long";
        }
        if (null !== $this->_max && $this->_max < $length) {
            $this->_messages[] = "'$valueString' is greater than $this->_max 
characters long";
        }
        if (count($this->_messages)) {
            return false;
        } else {
            return true;
        }
    }

If you add in these two validator changes, apply UTF-8 consistently (save all 
files as UTF-8, output HTML with the UTF-8 charset, and use the utf8_general_ci 
collation for MySQL (or it's equivelant for other DBMSs) then you're almost 
there. Just make sure ALL string manipulation/counting functions are replaced 
with their mbstring or iconv equivelants and that mbstring has UTF-8 set as 
it's internal encoding.

Long post, eh? :) Hope it helps some.
 
Pádraic Brady
http://blog.astrumfutura.com
http://www.patternsforphp.com


----- Original Message ----
From: Alexander Jaeger <[EMAIL PROTECTED]>
To: Darby Felton <[EMAIL PROTECTED]>
Cc: Guillaume Millet <[EMAIL PROTECTED]>; Zend Framework General 
<[email protected]>
Sent: Tuesday, May 29, 2007 9:44:21 AM
Subject: Re: [fw-general] Zend_Validate

Hello List,

i waited eagerly to see 1.0 RC1, and here it is ;)

I am still looking for a solution to my Umlauts problem,

i tried using the mbstring settings within php,

[mbstring]
mbstring.language = neutral
mbstring.http_input = iso-88589
mbstring.http_output = iso-8859-1
mbstring.internal_encoding = iso-8859-1
mbstring.encoding_translation = Off

I changed it a couple of times using german as language, and UTF-8 
encoding, also http_input as auto,
no success on that matter.

Then I red the new post @ http://framework.zend.com/issues/browse/ZF-269

Could anyone provide me with an example how to use the

Zend_Filter_Input::getRaw()

for this code:

$var1 = "jäger";

$filter_StripTags->filter($var1);

$alpha = new Zend_Validate();
$alpha->addValidator(new Zend_Validate_StringLength(1, 64));
$alpha->addValidator(new Zend_Validate_Alpha());

if ($alpha->isValid($var1) {
  echo $var1 . 'is Valid';
} else {
  echo $var1 . 'is not Valid';
}


I guess i am just being stupid... i have no idea how i get that workin.

Help greatly appreciated.

Alex



Darby Felton schrieb:
> Hi Alex,
>
> The issue you experience is related to ZF-269:
>
> http://framework.zend.com/issues/browse/ZF-269
>
> Thanks for the report!
>
> Best regards,
> Darby
>
> Alexander Jäger wrote:
>   
>> Sorry to bother,
>>
>> I try to validate a variable cotaining my lastname "jäger".
>>
>> $alpha = new Zend_Validate();
>> $alpha->addValidator(new Zend_Validate_StringLength(1, 64));
>> $alpha->addValidator(new Zend_Validate_Alpha());
>>
>> $var1 = "jäger"
>>
>> if ($alpha->isValid($var1) {
>>   echo $var1 . 'is Valid';
>> } else {
>>   echo $var1 . 'is not Valid';
>> }
>>
>> // prints: jäger is not Valid
>>
>> if itry the same with $var1='alexander' it is valid.
>>
>> I try to get the locales workin and they work using the
>> setlocale(LC_ALL,"de_DE.UTF-8");
>>
>> Also I tries using Zend_Locale, but still no improvement.
>>
>> Can any one point me to an direction to seek a solution.
>>
>> Please help ;)
>>
>> Alex
>>
>>
>> Alexander Jäger schrieb:
>>     
>>> Hello Guillaume,
>>>
>>> thats a simple solution....
>>>
>>> somtimes i don´t see the forest, because of all the trees.
>>>
>>> thanks.
>>>
>>> Alex
>>>
>>> Guillaume Millet schrieb:
>>>       
>>>> Hi Alex,
>>>>
>>>> If you're trying to validate data like your name using
>>>> Zend_Validate_Alpha, you may want to try and set PHP's locale to
>>>> German using setlocale() if it's not already done.
>>>>
>>>> Regards,
>>>>
>>>> Guillaume
>>>>
>>>> Lx a écrit :
>>>>
>>>>         
>>>>> Simon R Jones schrieb:
>>>>>
>>>>>           
>>>>>> Hi Alexander,
>>>>>>
>>>>>> Some domain names have been set up to accept international
>>>>>> characters, DE domains being one of them. More info on how to use
>>>>>> it is on
>>>>>> http://framework.zend.com/manual/en/zend.validate.validating_hostnames.html
>>>>>>
>>>>>>
>>>>>> However, there have been reports of it not working reliably for
>>>>>> some people. This is likely down to character encoding issues.
>>>>>> This has been reported in ZF-1083 (
>>>>>> http://framework.zend.com/issues/browse/ZF-1083 ) and I'm looking
>>>>>> into why this isn't working for some people. I had uploaded a test
>>>>>> script to that page but the upload process messed up the character
>>>>>> encoding so that doesn't work.
>>>>>>
>>>>>> I plan to write up some tests and host them myself to help resolve
>>>>>> this. I'll send you a link once I've done this.
>>>>>>
>>>>>> best wishes,
>>>>>> Simon
>>>>>>
>>>>>>   
>>>>>>             
>>>>> Hi Simon,
>>>>>
>>>>> thanks very much for the fast reply,
>>>>>
>>>>> i actually searched for a solution to validate a string containing
>>>>> umlaute such as a name as mine "Jäger" ;)
>>>>>
>>>>> But the example of the domains will help to find a solution to my
>>>>> umlaut problem.
>>>>>
>>>>> Again thank you,
>>>>>
>>>>> Alex
>>>>>
>>>>> P.S. I really love your project, thanks for the effort to build such
>>>>> an perfect framework
>>>>>           
>>>>         








       
____________________________________________________________________________________Yahoo!
 oneSearch: Finally, mobile search 
that gives answers, not web links. 
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

Reply via email to