On Wed, Nov 23, 2005 at 09:37:04AM -0500, Jeff Trawick wrote: > On 11/23/05, Joe Orton <[EMAIL PROTECTED]> wrote: > > On Sun, Nov 20, 2005 at 09:53:50AM -0500, Jeff Trawick wrote: > > > On input path, ap_xml_parse_input() handles converting xml to native > > > charset (at least in 2.2). On output, there is no provision for > > > converting xml in responses. > > > > OK, pop quiz: how is a Unicode XML document getting converted into > > EBCDIC on input without losing most of the character set along the way? > > unclear to me, at least... > > For this code: > > server/util_xml.c::ap_xml_parse_input(): > ... > #if APR_CHARSET_EBCDIC > apr_xml_parser_convert_doc(r->pool, *pdoc, ap_hdrs_from_ascii); > #endif > ... > > The xml library apparently parses the input it well enough to > understand the nodes. After that, it looks like the charset > translation specified here (ap_hdrs_from_ascii) should use the real > charset specified by the client. As it is, interesting* characters > won't be handled correctly.
It looks like ap_hdrs_from_ascii is always an ISO-8859-1->EBCDIC xlate handle, so this is really quite broken. The document will always be in UTF-8 as returned by the XML parser. At very least I guess it could try to do a UTF-8->EBCDIC conversion and fail if that's not possible. Sorting this stuff out is really a prerequisite to working out how to get the output side right, I'd say... joe