On Friday, June 27, 2003 6:01 PM, Philippe Verdy <[EMAIL PROTECTED]> wrote:
Given that XML will require normalization for texts identified as being Unicode encoded (UTF-8 and others), couldn't a document be labelled so that the normalization step be removed from the XML processing, using a "ISO-10646-8" encoding name (for the UTF-8 encoding scheme)? In that case, this would assume that the whole document does not adopt the Unicode normalization, but still uses the same repertoire... (So this would optionally remove a processing step for XML parsers, that would just apply the normalization only on input, but not in the internal processing, and not even in its output). Is it too much tricky for the XML conformance requirements? Who must adapt its standard? For me a document can be fully conforming to ISO10646 without being conforming to Unicode if it does not want to use the /implied/ Unicode properties such as combining classes and Unicode normalization forms (and there are certainly other interesting normalizations that could be useful for each language)... The caveat would be more a more complex font layout engine (with larger tables for combining sequences) if texts can be encoded without being normalized first... -- Philippe.

