Martijn Dekker dixit:

>So it looks like multibyte character support is not activated when
>commands are executed with -c.

Right. We just had this discussion a few weeks ago. Not a bug.

For the purpose of POSIX, mksh operates in the "C" locale,
and anything else is implementation-defined behaviour. mksh
does not track the LANG and LC_* variables, except at startup,
an interactive shell may do that, depending on the compilation
settings.

Read “man mksh”, section CAVEATS. Right at the end, there is:

     For the purpose of POSIX, mksh supports only the "C" locale. For users of
     UTF-8 locales, the following sh code makes the shell match the locale:

           case ${KSH_VERSION:-} in
           *MIRBSD KSH*|*LEGACY KSH*)
                   case ${LC_ALL:-${LC_CTYPE:-${LANG:-}}} in
                   *[Uu][Tt][Ff]8*|*[Uu][Tt][Ff]-8*) set -U ;;
                   *) set +U ;;
                   esac ;;
           esac

Short form, if you know you’re running mksh already:

set -U; [[ ${LC_ALL:-${LC_CTYPE:-${LANG:-}}} = *[Uu][Tt][Ff]?(-)8* ]] || set +U

The basic idea behind this is: on most OSes, the scripts
do not explicitly export LC_ALL=C at startup, yet assume
this from historical tradition. Enabling UTF-8 mode for
scripts (and -c is a scriptlet) would break too much.

bye,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
        -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2

Reply via email to