Martijn Dekker dixit:
>So it looks like multibyte character support is not activated when
>commands are executed with -c.
Right. We just had this discussion a few weeks ago. Not a bug.
For the purpose of POSIX, mksh operates in the "C" locale,
and anything else is implementation-defined behaviour. mksh
does not track the LANG and LC_* variables, except at startup,
an interactive shell may do that, depending on the compilation
settings.
Read “man mksh”, section CAVEATS. Right at the end, there is:
For the purpose of POSIX, mksh supports only the "C" locale. For users of
UTF-8 locales, the following sh code makes the shell match the locale:
case ${KSH_VERSION:-} in
*MIRBSD KSH*|*LEGACY KSH*)
case ${LC_ALL:-${LC_CTYPE:-${LANG:-}}} in
*[Uu][Tt][Ff]8*|*[Uu][Tt][Ff]-8*) set -U ;;
*) set +U ;;
esac ;;
esac
Short form, if you know you’re running mksh already:
set -U; [[ ${LC_ALL:-${LC_CTYPE:-${LANG:-}}} = *[Uu][Tt][Ff]?(-)8* ]] || set +U
The basic idea behind this is: on most OSes, the scripts
do not explicitly export LC_ALL=C at startup, yet assume
this from historical tradition. Enabling UTF-8 mode for
scripts (and -c is a scriptlet) would break too much.
bye,
//mirabilos
--
“It is inappropriate to require that a time represented as
seconds since the Epoch precisely represent the number of
seconds between the referenced time and the Epoch.”
-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2