William A. Rowe Jr. <wr...@rowe-clan.net> писал в своём письме Tue, 13 Apr 2010 19:18:57 +0500:

And what is the encoding of that file?  Certainly no assurance that data
is unicode, or one of the local code pages.  APR can't and doesn't try to
deal with the representation of data passed around using APR.  In general
windows environment is very good about handling utf-8 data, although it's
irritating in the insistence on polluting streams with BOM's.

I agree that you can't reliably predict what encoding a file is in, but I assert the system ANSI code page (which apr_os_locale_encoding should IMO return) is a reasonable default. It's certainly not the user locale's code page (which it currently returns) — because nothing uses that. 8=]

Something APR should address, is that -printing- that to a console stream,
a utf-8 stream can easily be handled with unicode.  That's a problem apr
could reasonably solve for command line apps.

Perhaps, but printing to the console is not what's broken here.

or when I'm printing the username that I got from apr_uid_name_get.

... will always be utf-8, back to my point about external representations.
Internally, APR always pulls from the Win32 Unicode functions.


Um, that's just not true.

APR_DECLARE(apr_status_t) apr_uid_name_get(char **username, apr_uid_t userid,
                                           apr_pool_t *p)
{
    /* WinCE code snipped */
    SID_NAME_USE type;
    char name[MAX_PATH], domain[MAX_PATH];
    DWORD cbname = sizeof(name), cbdomain = sizeof(domain);
    if (!userid)
        return APR_EINVAL;
if (!LookupAccountSid(NULL, userid, name, &cbname, domain, &cbdomain, &type))
        return apr_get_os_error();
if (type != SidTypeUser && type != SidTypeAlias && type != SidTypeWellKnownGroup)
        return APR_EINVAL;
    *username = apr_pstrdup(p, name);
    return APR_SUCCESS;
}

It's printing into a char buffer, ergo, it uses the ANSI variant of LookupAccountSid, and therefore the result is in the system ANSI code page. Same in the Unix version:

APR_DECLARE(apr_status_t) apr_uid_name_get(char **username, apr_uid_t userid,
                                           apr_pool_t *p)
{
    struct passwd *pw;
    struct passwd pwd;
    char pwbuf[PWBUF_SIZE];
    apr_status_t rv;

    rv = getpwuid_r(userid, &pwd, pwbuf, sizeof(pwbuf), &pw);
    if (rv) {
        return rv;
    }

    if (pw == NULL) {
        return APR_ENOENT;
    }

    /* thread-unsafe code snipped */
    *username = apr_pstrdup(p, pw->pw_name);
    return APR_SUCCESS;
}

getpwuid_t returns the raw byte representation of the username, which is in the locale encoding (well, Unix being byte-oriented, it can be an arbitrary binary string, but presumably the sysadmin uses the same encoding everywhere).

The difference is, on Unix, the result of apr_uid_name_get (and many other functions, I'm sure) is in the encoding detected by apr_os_locale_encoding, while on Windows this may not be the case.

Roman.

Reply via email to