Hi Alex,
You're not the only guy with an accented character in their name ;).
Your problem is not a bug within the Zend Framework (except that it's overrun
with Anglic overlords ;)), rather it's that you are using Validator rules
unsafe for accented/multibyte characters without some changes. Personally I
wish someone would put a warning sticker on those String rules since they are
not safe to use for internationalised applications and should be avoided. It's
a weak spot in the framework since strlen() in PHP does not in fact count
characters (it counts bytes!).
Firstly, Zend_Validate_Alpha places a call to ctype_alpha(). This function is
likely not reading accented characters as being valid alphabetic characters by
default. This is because your currently set Locale doesn't understand them or
they are encoded in a non-default character set. For example, an accented
character encoded in UTF-8, passed into ctype_alpha() with the correct locale
would still fail until the input string is converted to something like
ISO-8849-1 (or 15 these days). Check how many bytes any accented character in
UTF-8 contains ;).
At this stage I usually say screw ctype_alpha() and use something where the
input is more comparable - like preg_match(). So long as the file containing
the character range is stored as UTF-8 it works perfectly well. So in your case
you can write a new Validator using something like:
public function isValid($value)
{
$this->_messages = array();
$valueString = (string) $value;
/**
* German Alpha range requiring a minimum length of 1 (empty string
will fail)
* NB: This PHP file must be saved as UTF-8 only.
*/
if (!preg_match("/^[a-zA-ZäÄéÉÖöÜüß]{1,}$/", $valueString)) {
$this->_messages[] = "'$valueString' has not only Deutsche
alphabetic characters";
return false;
}
return true;
}
The file must be stored as UTF-8 (not any ISO-8849-1 to 15). Also any input
being validated must likewise be UTF-8 (or validated as UTF-8), i.e. web pages
should be server with this header:
Content-Type: text/html;charset=UTF-8 or an equivalent meta tag in the HTML
head section so forms are sent as UTF-8. You can also add the
charset-accept="utf-8" attribute to your forms (in case some browsers need the
extra hint).
Alternatively the character range can be specified using the PCRE \x{????}
format, where ???? is the Unicode "code point" (hexidecimal) for that
character. This one needs the \u flag at the end of the pattern to enable
PCRE's unicode support. This is probably the better solution if possible - the
PHP file encoding then becomes less essential since multibyte characters are
now represented as ranges/points of Unicode in terms of hexidecimal.
Secondly, the StringLength validator uses strlen(). This counts bytes, not
characters. If you write a string like "groß", you have 4 characters. But
strlen() will tell you there are 5 bytes (so standard strlen is useless for
counting characters including anything with accents under Unicode). If you have
such characters (including Unicode accented characters) then you need to use
either mbstring or iconv to write a new validator. For example, here's an
extract of one based on mbstring (assuming UTF-8 settings).
// Zps_Validate_StringLength extends Zend_Validate_StringLength
public function isValid($value)
{
$this->_messages = array();
$valueString = (string) $value;
/**
* Can omit the encoding reference if you prefer setting it in php.ini
or using
* mb_internal_encoding() in the bootstrap.
*/
$length = mb_strlen($valueString, 'UTF-8');
if ($length < $this->_min) {
$this->_messages[] = "'$valueString' is less than $this->_min
characters long";
}
if (null !== $this->_max && $this->_max < $length) {
$this->_messages[] = "'$valueString' is greater than $this->_max
characters long";
}
if (count($this->_messages)) {
return false;
} else {
return true;
}
}
If you add in these two validator changes, apply UTF-8 consistently (save all
files as UTF-8, output HTML with the UTF-8 charset, and use the utf8_general_ci
collation for MySQL (or it's equivelant for other DBMSs) then you're almost
there. Just make sure ALL string manipulation/counting functions are replaced
with their mbstring or iconv equivelants and that mbstring has UTF-8 set as
it's internal encoding.
Long post, eh? :) Hope it helps some.
Pádraic Brady
http://blog.astrumfutura.com
http://www.patternsforphp.com
----- Original Message ----
From: Alexander Jaeger <[EMAIL PROTECTED]>
To: Darby Felton <[EMAIL PROTECTED]>
Cc: Guillaume Millet <[EMAIL PROTECTED]>; Zend Framework General
<[email protected]>
Sent: Tuesday, May 29, 2007 9:44:21 AM
Subject: Re: [fw-general] Zend_Validate
Hello List,
i waited eagerly to see 1.0 RC1, and here it is ;)
I am still looking for a solution to my Umlauts problem,
i tried using the mbstring settings within php,
[mbstring]
mbstring.language = neutral
mbstring.http_input = iso-88589
mbstring.http_output = iso-8859-1
mbstring.internal_encoding = iso-8859-1
mbstring.encoding_translation = Off
I changed it a couple of times using german as language, and UTF-8
encoding, also http_input as auto,
no success on that matter.
Then I red the new post @ http://framework.zend.com/issues/browse/ZF-269
Could anyone provide me with an example how to use the
Zend_Filter_Input::getRaw()
for this code:
$var1 = "jäger";
$filter_StripTags->filter($var1);
$alpha = new Zend_Validate();
$alpha->addValidator(new Zend_Validate_StringLength(1, 64));
$alpha->addValidator(new Zend_Validate_Alpha());
if ($alpha->isValid($var1) {
echo $var1 . 'is Valid';
} else {
echo $var1 . 'is not Valid';
}
I guess i am just being stupid... i have no idea how i get that workin.
Help greatly appreciated.
Alex
Darby Felton schrieb:
> Hi Alex,
>
> The issue you experience is related to ZF-269:
>
> http://framework.zend.com/issues/browse/ZF-269
>
> Thanks for the report!
>
> Best regards,
> Darby
>
> Alexander Jäger wrote:
>
>> Sorry to bother,
>>
>> I try to validate a variable cotaining my lastname "jäger".
>>
>> $alpha = new Zend_Validate();
>> $alpha->addValidator(new Zend_Validate_StringLength(1, 64));
>> $alpha->addValidator(new Zend_Validate_Alpha());
>>
>> $var1 = "jäger"
>>
>> if ($alpha->isValid($var1) {
>> echo $var1 . 'is Valid';
>> } else {
>> echo $var1 . 'is not Valid';
>> }
>>
>> // prints: jäger is not Valid
>>
>> if itry the same with $var1='alexander' it is valid.
>>
>> I try to get the locales workin and they work using the
>> setlocale(LC_ALL,"de_DE.UTF-8");
>>
>> Also I tries using Zend_Locale, but still no improvement.
>>
>> Can any one point me to an direction to seek a solution.
>>
>> Please help ;)
>>
>> Alex
>>
>>
>> Alexander Jäger schrieb:
>>
>>> Hello Guillaume,
>>>
>>> thats a simple solution....
>>>
>>> somtimes i don´t see the forest, because of all the trees.
>>>
>>> thanks.
>>>
>>> Alex
>>>
>>> Guillaume Millet schrieb:
>>>
>>>> Hi Alex,
>>>>
>>>> If you're trying to validate data like your name using
>>>> Zend_Validate_Alpha, you may want to try and set PHP's locale to
>>>> German using setlocale() if it's not already done.
>>>>
>>>> Regards,
>>>>
>>>> Guillaume
>>>>
>>>> Lx a écrit :
>>>>
>>>>
>>>>> Simon R Jones schrieb:
>>>>>
>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> Some domain names have been set up to accept international
>>>>>> characters, DE domains being one of them. More info on how to use
>>>>>> it is on
>>>>>> http://framework.zend.com/manual/en/zend.validate.validating_hostnames.html
>>>>>>
>>>>>>
>>>>>> However, there have been reports of it not working reliably for
>>>>>> some people. This is likely down to character encoding issues.
>>>>>> This has been reported in ZF-1083 (
>>>>>> http://framework.zend.com/issues/browse/ZF-1083 ) and I'm looking
>>>>>> into why this isn't working for some people. I had uploaded a test
>>>>>> script to that page but the upload process messed up the character
>>>>>> encoding so that doesn't work.
>>>>>>
>>>>>> I plan to write up some tests and host them myself to help resolve
>>>>>> this. I'll send you a link once I've done this.
>>>>>>
>>>>>> best wishes,
>>>>>> Simon
>>>>>>
>>>>>>
>>>>>>
>>>>> Hi Simon,
>>>>>
>>>>> thanks very much for the fast reply,
>>>>>
>>>>> i actually searched for a solution to validate a string containing
>>>>> umlaute such as a name as mine "Jäger" ;)
>>>>>
>>>>> But the example of the domains will help to find a solution to my
>>>>> umlaut problem.
>>>>>
>>>>> Again thank you,
>>>>>
>>>>> Alex
>>>>>
>>>>> P.S. I really love your project, thanks for the effort to build such
>>>>> an perfect framework
>>>>>
>>>>
____________________________________________________________________________________Yahoo!
oneSearch: Finally, mobile search
that gives answers, not web links.
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC