Johannes,
You're telling me an explicit cast to binary could fail internally but
not
externally? That doesn't make a lot of sense somehow.
Externally the user is responsible to select the proper encoding
internally PHP has to guess.
case 's':
case 'S':
{
char **p = va_arg(*va, char **);
int *pl = va_arg(*va, int *);
UConverter *conv = NULL;
<snip />
switch (Z_TYPE_PP(arg)) {
<snip />
case IS_UNICODE:
/* handle conversion of Unicode to binary with a specific converter */
if (conv != NULL) { /* this is an 's' specifier */
SEPARATE_ZVAL_IF_NOT_REF(arg);
if (convert_to_string_with_converter(*arg, conv) == FAILURE) {
return "";
}
*p = Z_STRVAL_PP(arg);
*pl = Z_STRLEN_PP(arg);
break;
} else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) {
return "strictly a binary string";
}
/* fall through */
I'll try to explain why this isn't useful. First off, you get anomalies like
this:
C:\sandbox\php6\Debug_TS>php -r "echo crc32('');"
Warning: crc32() expects parameter 1 to be strictly a binary string, Unicode
string given in Command line code on line 1
C:\sandbox\php6\Debug_TS>php -r "echo crc32(null);"
0
Second, you don't always get the same value anyway if the encoding changes.
Test script:
echo crc32((binary)'שלום')."\n";
echo crc32((binary)'AKUO');
with the script saved in UTF-8 and
unicode.fallback_encoding=UTF-8
unicode.runtime_encoding=UTF-8
unicode.stream_encoding=UTF-8
output is:
-1600612531
1603041141
with the same script saved in ISO-8859-8 and
unicode.fallback_encoding=ISO-8859-8
unicode.runtime_encoding=ISO-8859-8
unicode.stream_encoding=ISO-8859-8
output is:
-2023737703
1603041141
These are exactly the same results I see under PHP 5, depending whether the
script is saved in ISO-8859-8 or UTF-8.
Now if I remove the (binary) cast and alter the relevant section of
zend_parse_arg_impl():
} else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) {
if (zval_unicode_to_string(*(arg) TSRMLS_CC) == FAILURE) {
return "strictly a binary string";
}
}
/* fall through */
I get exactly the same results again, with or without the binary cast. All
that changes is I don't get an error when I skip the casting.
If the script encoding doesn't match up with whatever's set in INI, I don't
get as far as that stuff anyway:
Warning: Illegal or truncated character in input: offset 0, state=0 in
C:\sandbox\php-src\Debug_TS\help.php on line 5
Parse error: parse error, expecting `')'' in
C:\sandbox\php-src\Debug_TS\help.php on line 5
- regardless of whether I cast to binary or not, and regardless of whether
I've messed with the src.
- Steph
johannes
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php