Hi All,
I am working on a bug where 0xA is not treated as whitespace using the
Uniconv390TransService::isSpace method which causes sanityTest.pl failures
with XSValueTest (the failure is a result of an earlier change I made for
updating the unicode level). Looking at the various transcoders we have, I
found that they don't give consistent results for what a space is (some use
a hard coded table, other make calls to isspace or iswspace). For example
on Windows it uses iswspace which says the following are spaces: 0x9, 0xA,
0xB, 0xC, 0xD, 0x20, 0x2000 - 0x200B, 0x3000, 0xFEFF (other transcoders
give similar but not identical results). However, it seems to me that as
an XML parser we should be treating spaces according to the XML standard:
(#x20 | #x9 | #xD | #xA)+
So I propose that we leave the existing Transcoders isSpace methods as they
are (with a fix for Uniconv390 so that it allows 0xA to be considered as
whitespace)
and modify the places that call the transcoders isspace routine to call
another routine which checks for XML space instead. The callers of the
routine are XMLString::trim, XMLString::tokenize, XMLURL::parse and
XMLBigInteger.
Let me know if there are any objections...
Regards,
David A. Cargill
XML Parser Development
IBM Toronto Lab
(905) 413-2371, tie 969
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]