Rob,
I have not tested the patch, but it looks good to me on cursory
overview. I assume it passes your tests?
The only comment I have is regarding the usage of 't' and 'T'
specifiers. Since you always have to pass binary UTF-8 strings to
libxml, we should always use 's' specifier and let PHP downconvert
Unicode strings based on the runtime encoding (which you set to UTF-8).
-Andrei
On Jul 17, 2006, at 2:57 PM, Rob Richards wrote:
Attached is a patch for my initial cut for unicode and XML (made
against the /ext directory).
I started with XMLReader since it was the smallest.
The code can probably be optimized a bit, but I want to make sure this
is how it should be because the changes made here will be the changes
needed for the rest of the XML based extensions (simplexml, xsl,
xmlwriter, and xml to a point).
It includes the following:
Macros defined in php_libxml.h (names can be changed if anyone has
a problem with them).
ZVAL_XML_STRING(z, s, flags)
RETVAL_XML_STRING(s, flags)
These are used to take the UTF-8 output from libxml2 functions
and return correct string (UTF-16 when running unicode mode or UTF-8
when not)
XMLReader:
In order to maintain BC with PHP 5 it accepts unicode and binary
strings (UTF-8 as in PHP 5) as parameters. The paramters can be mixed
(some unicode and some binary so strings are properly converted to
UTF-8 to work with libxml2).
In order to only require 1 hash table for properties, the
following is used in MINIT:
zend_u_hash_init(&xmlreader_prop_handlers, 0, NULL, NULL, 1,
(zend_bool)zend_ini_long("unicode.semantics",
sizeof("unicode.semantics"), 1));
Tests have been updated for unicode mode.
Let me know if anyone sees any problems with these changes.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php