On 18 Sep 2010, at 7:10 PM, Graham Leggett wrote:

When the SSI tag below is handled, the value of the string output to the browser is entity encoded:

<!--#echo encoding="entity" var="MY_VAR"-->

This is done with a line that looks something like this:

/* PR#25202: escape anything non-ascii here */
echo_text = ap_escape_html2(ctx->dpool, val, 1);

The problem with the above is the parameter "1", which means that non-ASCII characters are entity encoded as html escape sequences, and in the process anything encoded with UTF-8 (and is not ASCII) breaks.

Looking further at PR25202, this caused a regression described in PR47686 where UTF-8 support broke.

I've created a fix for this, where the "set" and "echo" SSI command have been taught to handle "encoding" and "decoding" parameters.

For both echo and for set, the value is first decoded by the given parameter, and then encoded by the given parameter. This allows full control of the encoding and decoding of variables and echoed parameters, depending on where they came from.

Encoding and decoding can contain multiple values, so that you can for example strip off urlencoding, then entity encoding before using a value, like this: decoding="url,entity".

Regards,
Graham
--

Attachment: httpd-mod_include-encoding-decoding.patch
Description: Binary data


Reply via email to