Rob,

I have not tested the patch, but it looks good to me on cursory overview. I assume it passes your tests? The only comment I have is regarding the usage of 't' and 'T' specifiers. Since you always have to pass binary UTF-8 strings to libxml, we should always use 's' specifier and let PHP downconvert Unicode strings based on the runtime encoding (which you set to UTF-8).

-Andrei

On Jul 17, 2006, at 2:57 PM, Rob Richards wrote:

Attached is a patch for my initial cut for unicode and XML (made against the /ext directory).
I started with XMLReader since it was the smallest.
The code can probably be optimized a bit, but I want to make sure this is how it should be because the changes made here will be the changes needed for the rest of the XML based extensions (simplexml, xsl, xmlwriter, and xml to a point).

It includes the following:
Macros defined in php_libxml.h (names can be changed if anyone has a problem with them).
       ZVAL_XML_STRING(z, s, flags)
       RETVAL_XML_STRING(s, flags)
These are used to take the UTF-8 output from libxml2 functions and return correct string (UTF-16 when running unicode mode or UTF-8 when not)

   XMLReader:
In order to maintain BC with PHP 5 it accepts unicode and binary strings (UTF-8 as in PHP 5) as parameters. The paramters can be mixed (some unicode and some binary so strings are properly converted to UTF-8 to work with libxml2).

In order to only require 1 hash table for properties, the following is used in MINIT: zend_u_hash_init(&xmlreader_prop_handlers, 0, NULL, NULL, 1, (zend_bool)zend_ini_long("unicode.semantics", sizeof("unicode.semantics"), 1));

      Tests have been updated for unicode mode.

Let me know if anyone sees any problems with these changes.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to