On 5/23/16 2:44 AM, Michał Górny wrote:
> On Sun, 22 May 2016 13:04:40 -0400
> "Anthony G. Basile" <bas...@opensource.dyc.edu> wrote:
> 
>> From: "Anthony G. Basile" <bluen...@gentoo.org>
>>
>> The current method to check for a sane system locale is to use python's
>> ctypes.util.find_library() to construct a full library path to the
>> system libc.so and pass that path to ctypes.CDLL() so we can call
>> toupper() and tolower() directly.  However, this gets bogged down in
>> implementation details and fails with musl.
>>
>> We work around this design flaw in ctypes with a small python module
>> written in C which provides thin wrappers to toupper() and tolower(),
>> and only fall back on the current ctypes-based check when this module
>> is not available.
>>
>> This has been tested on glibc, uClibc and musl systems.
>>
>> X-Gentoo-bug: 571444
>> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
>>
>> Signed-off-by: Anthony G. Basile <bluen...@gentoo.org>
>> ---
>>  pym/portage/util/locale.py   | 32 ++++++++++-----
>>  setup.py                     |  6 ++-
>>  src/portage_c_convert_case.c | 94 
>> ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 121 insertions(+), 11 deletions(-)
>>  create mode 100644 src/portage_c_convert_case.c
>>
>> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
>> index 2a15ea1..85ddd2b 100644
>> --- a/pym/portage/util/locale.py
>> +++ b/pym/portage/util/locale.py
>> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
>>  import locale
>>  import logging
>>  import os
>> +import sys
>>  import textwrap
>>  import traceback
>>  
>> @@ -34,18 +35,26 @@ def _check_locale(silent):
>>      """
>>      The inner locale check function.
>>      """
>> -
>> -    libc_fn = find_library("c")
>> -    if libc_fn is None:
>> -            return None
>> -    libc = LoadLibrary(libc_fn)
>> -    if libc is None:
>> -            return None
>> +    try:
>> +            from portage_c_convert_case import _c_toupper, _c_tolower
>> +            libc_tolower = _c_tolower
>> +            libc_toupper = _c_toupper
> 
> Now I'm being picky... but if you named the functions toupper()
> and tolower(), you could actually import the whole module as 'libc'
> and have less code!

I see what you're saying, and its tempting because its elegant, but I'm
afraid of a clash of names.  I've got a bad feeling this will get us
into trouble later.

Let me play with this and see what happens.

> 
> Also it would be nice to actually make the module more generic. There
> are more places where we use CDLL, and all of them could eventually be
> supported by the module (unshare() would be much better done in C, for
> example).

Yeah I get your point here.  Let me convince myself first.

> 
>> +    except ImportError:
>> +            writemsg_level("!!! Unable to import 
>> portage_c_convert_case\n!!!\n",
>> +                    level=logging.WARNING, noiselevel=-1)
> 
> Do we really want to warn verbosely about this? I think it'd be
> a pretty common case for people running the git checkout.

This should stay.  Its good to know that the module is not being
imported and silently falling back on the ctypes stuff.

1) its only going to happen in the rare occasion that you're using
something like a turkish locale and can't import the module.

2) people who do a git checkout should add
PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
I can add something to testpath.  Users will have to be instructed to
run `./setup build` and then the script shoudl read something like this

unamem=$(uname -m)

pythonversion=$(python --version 2>&1 | cut -c8-)
pythonversion=${pythonversion%\.*}

portagedir=$(dirname ${BASH_SOURCE[0]})

export PATH="${portagedir}/bin:${PATH}"

export
PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"

export PYTHONWARNINGS=d,i::ImportWarning


BTW, the original code must have a bug in it.  It reads

export PYTHONPATH=PYTHONPATH="$(dirname
$BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"

The double PYTHONPATH=PYTHONPATH= can't be right.

> 
>> +            libc_fn = find_library("c")
>> +            if libc_fn is None:
>> +                    return None
>> +            libc = LoadLibrary(libc_fn)
>> +            if libc is None:
>> +                    return None
>> +            libc_tolower = libc.tolower
>> +            libc_toupper = libc.toupper
>>  
>>      lc = list(range(ord('a'), ord('z')+1))
>>      uc = list(range(ord('A'), ord('Z')+1))
>> -    rlc = [libc.tolower(c) for c in uc]
>> -    ruc = [libc.toupper(c) for c in lc]
>> +    rlc = [libc_tolower(c) for c in uc]
>> +    ruc = [libc_toupper(c) for c in lc]
>>  
>>      if lc != rlc or uc != ruc:
>>              if silent:
>> @@ -62,7 +71,10 @@ def _check_locale(silent):
>>                      "as LC_CTYPE in make.conf.")
>>              msg = [l for l in textwrap.wrap(msg, 70)]
>>              msg.append("")
>> -            chars = lambda l: ''.join(chr(x) for x in l)
>> +            if sys.version_info.major >= 3:
> 
> Portage uses hexversion for comparisons. Please be consistent.

I see that.  Yuck, why use sys.hexversion?  version_info is easier to
read.  Anyhow, I'll change it because consistency is important.

> 
>> +                    chars = lambda l: ''.join(chr(x) for x in l)
>> +            else:
>> +                    chars = lambda l: ''.join(chr(x).decode('utf-8', 
>> 'replace') for x in l)
> 
> This looks like an unrelated change. Was the original code buggy? If
> this is the case, then please fix it in a separate commit.

Yes the original code is buggy.  Either via the module or without, if x
is outside the ascii range, chr(x) fails to convert it properly and
throws an error.  This code was not tested:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
124, in check_locale
    ret = _check_locale(silent)
  File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
81, in _check_locale
    "  %s -> %s" % (chars(lc), chars(ruc)),
  File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
74, in <lambda>
    chars = lambda l: ''.join(chr(x) for x in l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
ordinal not in range(128)

I'll separate out that commit.

> 
>>              if uc != ruc:
>>                      msg.extend([
>>                              "  %s -> %s" % (chars(lc), chars(ruc)),
>> diff --git a/setup.py b/setup.py
>> index 25429bc..8b6b408 100755
>> --- a/setup.py
>> +++ b/setup.py
>> @@ -47,7 +47,11 @@ x_scripts = {
>>  # Dictionary custom modules written in C/C++ here.  The structure is
>>  #   key   = module name
>>  #   value = list of C/C++ source code, path relative to top source directory
>> -x_c_helpers = {}
>> +x_c_helpers = {
>> +    'portage_c_convert_case' : [
>> +            'src/portage_c_convert_case.c',
>> +    ],
>> +}
>>  
>>  class x_build(build):
>>      """ Build command with extra build_man call. """
>> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
>> new file mode 100644
>> index 0000000..f60b0c2
>> --- /dev/null
>> +++ b/src/portage_c_convert_case.c
>> @@ -0,0 +1,94 @@
>> +/* Copyright 2005-2016 Gentoo Foundation
>> + * Distributed under the terms of the GNU General Public License v2
>> + */
>> +
>> +#include <Python.h>
>> +#include <ctype.h>
>> +
>> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
>> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
>> +
>> +static PyMethodDef ConvertCaseMethods[] = {
>> +    {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case 
>> using system locale."},
>> +    {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case 
>> using system locale."},
>> +    {NULL, NULL, 0, NULL}
> 
> You should include stdlib.h or stdio.h for NULL.

yep.

> 
>> +};
>> +
>> +#if PY_MAJOR_VERSION >= 3
>> +static struct PyModuleDef moduledef = {
>> +    PyModuleDef_HEAD_INIT,
>> +    "portage_c_convert_case",                                       /* 
>> m_name */
>> +    "Module for converting case using the system locale",           /* 
>> m_doc */
>> +    -1,                                                             /* 
>> m_size */
>> +    ConvertCaseMethods,                                             /* 
>> m_methods */
>> +    NULL,                                                           /* 
>> m_reload */
>> +    NULL,                                                           /* 
>> m_traverse */
>> +    NULL,                                                           /* 
>> m_clear */
>> +    NULL,                                                           /* 
>> m_free */
>> +};
>> +#endif
>> +
>> +static PyObject *ConvertCaseError;
>> +
>> +PyMODINIT_FUNC
>> +#if PY_MAJOR_VERSION >= 3
>> +PyInit_portage_c_convert_case(void)
>> +#else
>> +initportage_c_convert_case(void)
>> +#endif
>> +{
>> +    PyObject *m;
>> +
>> +#if PY_MAJOR_VERSION >= 3
>> +    m = PyModule_Create(&moduledef);
>> +#else
>> +    m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
>> +#endif
>> +
>> +    if (m == NULL)
>> +#if PY_MAJOR_VERSION >= 3
>> +            return NULL;
>> +#else
>> +            return;
>> +#endif
>> +
>> +    ConvertCaseError = 
>> PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
>> +    Py_INCREF(ConvertCaseError);
>> +    PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
>> +
>> +#if PY_MAJOR_VERSION >= 3
>> +    return m;
>> +#else
>> +    return;
>> +#endif
>> +}
> 
> To be honest, I think we'd be happier having two big #ifdefs for init
> funcs rather than one function with a lot of #ifdefs, and the common
> ConvertCaseError part in a separate static function.

I can move things around and reduce the #ifdefs, especially if we drop
the custom exception.

> 
>> +
>> +
>> +static PyObject *
>> +portage_c_tolower(PyObject *self, PyObject *args)
>> +{
>> +    int c;
>> +
>> +    if (!PyArg_ParseTuple(args, "i", &c))
>> +    {
>> +            PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple 
>> failed");
> 
> From PyArg_ParseTuple() [1]:
> 
> | on failure, it returns false and raises the appropriate exception.
> 
> So I don't think you need or should use a custom exception here.
> 
> [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple

I am aware, but then you loose the specificity of the error, whether it
occurred in _c_tolower or _c_toupper.  Its small, so I'll remove it
because it also cleans up the #ifdef spaghetti, but if we were to expand
this module for other libc functions then we should consider re-adding it.

> 
>> +            return NULL;
>> +    }
>> +
>> +    return Py_BuildValue("i", tolower(c));
>> +}
>> +
>> +
>> +static PyObject *
>> +portage_c_toupper(PyObject *self, PyObject *args)
>> +{
>> +    int c;
>> +
>> +    if (!PyArg_ParseTuple(args, "i", &c))
>> +    {
>> +            PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple 
>> failed");
>> +            return NULL;
>> +    }
>> +
>> +    return Py_BuildValue("i", toupper(c));
>> +}
> 
> Thanks a lot for your effort. This is really getting in shape.

I'm doing this because check_locale() broke portage on musl.  I'm not a
frequent contributor to portage so I'd ask the portage team to free free
style it up for me.  vapier does that to my code all the time :)


-- 
Anthony G. Basile, Ph. D.
Chair of Information Technology
D'Youville College
Buffalo, NY 14201
(716) 829-8197

Reply via email to