Johannes,

You're telling me an explicit cast to binary could fail internally but not
externally? That doesn't make a lot of sense somehow.

Externally the user is responsible to select the proper encoding
internally PHP has to guess.

 case 's':
 case 'S':
  {
   char **p = va_arg(*va, char **);
   int *pl = va_arg(*va, int *);
   UConverter *conv = NULL;
<snip />
   switch (Z_TYPE_PP(arg)) {
<snip />
    case IS_UNICODE:
     /* handle conversion of Unicode to binary with a specific converter */
     if (conv != NULL) { /* this is an 's' specifier */
      SEPARATE_ZVAL_IF_NOT_REF(arg);
      if (convert_to_string_with_converter(*arg, conv) == FAILURE) {
       return "";
      }
      *p = Z_STRVAL_PP(arg);
      *pl = Z_STRLEN_PP(arg);
      break;
     } else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) {
      return "strictly a binary string";
     }
     /* fall through */

I'll try to explain why this isn't useful. First off, you get anomalies like this:

C:\sandbox\php6\Debug_TS>php -r "echo crc32('');"

Warning: crc32() expects parameter 1 to be strictly a binary string, Unicode string given in Command line code on line 1
C:\sandbox\php6\Debug_TS>php -r "echo crc32(null);"
0

Second, you don't always get the same value anyway if the encoding changes. Test script:

echo crc32((binary)'שלום')."\n";
echo crc32((binary)'AKUO');

with the script saved in UTF-8 and
unicode.fallback_encoding=UTF-8
unicode.runtime_encoding=UTF-8
unicode.stream_encoding=UTF-8

output is:
-1600612531
1603041141

with the same script saved in ISO-8859-8 and
unicode.fallback_encoding=ISO-8859-8
unicode.runtime_encoding=ISO-8859-8
unicode.stream_encoding=ISO-8859-8

output is:
-2023737703
1603041141

These are exactly the same results I see under PHP 5, depending whether the script is saved in ISO-8859-8 or UTF-8.

Now if I remove the (binary) cast and alter the relevant section of zend_parse_arg_impl():

     } else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) {
      if (zval_unicode_to_string(*(arg) TSRMLS_CC) == FAILURE) {
       return "strictly a binary string";
      }
     }
     /* fall through */

I get exactly the same results again, with or without the binary cast. All that changes is I don't get an error when I skip the casting.

If the script encoding doesn't match up with whatever's set in INI, I don't get as far as that stuff anyway:

Warning: Illegal or truncated character in input: offset 0, state=0 in C:\sandbox\php-src\Debug_TS\help.php on line 5

Parse error: parse error, expecting `')'' in C:\sandbox\php-src\Debug_TS\help.php on line 5

- regardless of whether I cast to binary or not, and regardless of whether I've messed with the src.


- Steph


johannes


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to