On 5/23/16 10:25 AM, Michał Górny wrote:
> On Mon, 23 May 2016 08:08:18 -0400
> "Anthony G. Basile" <bas...@opensource.dyc.edu> wrote:
> 
>> On 5/23/16 2:44 AM, Michał Górny wrote:
>>> On Sun, 22 May 2016 13:04:40 -0400
>>> "Anthony G. Basile" <bas...@opensource.dyc.edu> wrote:
>>>   
>>>> From: "Anthony G. Basile" <bluen...@gentoo.org>
>>>>
>>>> The current method to check for a sane system locale is to use python's
>>>> ctypes.util.find_library() to construct a full library path to the
>>>> system libc.so and pass that path to ctypes.CDLL() so we can call
>>>> toupper() and tolower() directly.  However, this gets bogged down in
>>>> implementation details and fails with musl.
>>>>
>>>> We work around this design flaw in ctypes with a small python module
>>>> written in C which provides thin wrappers to toupper() and tolower(),
>>>> and only fall back on the current ctypes-based check when this module
>>>> is not available.
>>>>
>>>> This has been tested on glibc, uClibc and musl systems.
>>>>
>>>> X-Gentoo-bug: 571444
>>>> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
>>>>
>>>> Signed-off-by: Anthony G. Basile <bluen...@gentoo.org>
>>>> ---
>>>>  pym/portage/util/locale.py   | 32 ++++++++++-----
>>>>  setup.py                     |  6 ++-
>>>>  src/portage_c_convert_case.c | 94 
>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 121 insertions(+), 11 deletions(-)
>>>>  create mode 100644 src/portage_c_convert_case.c
>>>>
>>>> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
>>>> index 2a15ea1..85ddd2b 100644
>>>> --- a/pym/portage/util/locale.py
>>>> +++ b/pym/portage/util/locale.py
>>>> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
>>>>  import locale
>>>>  import logging
>>>>  import os
>>>> +import sys
>>>>  import textwrap
>>>>  import traceback
>>>>  
>>>> @@ -34,18 +35,26 @@ def _check_locale(silent):
>>>>    """
>>>>    The inner locale check function.
>>>>    """
>>>> -
>>>> -  libc_fn = find_library("c")
>>>> -  if libc_fn is None:
>>>> -          return None
>>>> -  libc = LoadLibrary(libc_fn)
>>>> -  if libc is None:
>>>> -          return None
>>>> +  try:
>>>> +          from portage_c_convert_case import _c_toupper, _c_tolower
>>>> +          libc_tolower = _c_tolower
>>>> +          libc_toupper = _c_toupper  
>>>
>>> Now I'm being picky... but if you named the functions toupper()
>>> and tolower(), you could actually import the whole module as 'libc'
>>> and have less code!  
>>
>> I see what you're saying, and its tempting because its elegant, but I'm
>> afraid of a clash of names.  I've got a bad feeling this will get us
>> into trouble later.
>>
>> Let me play with this and see what happens.
> 
> I don't think this will be problematic since things like this happen
> in Python all the time ;-). And after all, C function names can be
> different than Python function names.

It works fine so my last set of patches adopts this approach.

> 
>>> Also it would be nice to actually make the module more generic. There
>>> are more places where we use CDLL, and all of them could eventually be
>>> supported by the module (unshare() would be much better done in C, for
>>> example).  
>>
>> Yeah I get your point here.  Let me convince myself first.
> 
> I've got a killer argument: right now we hardcode constants from Linux
> headers in the Python code!
> 
> Not that I'm asking you to actually add code for that as well. Just
> rename the module to something more generic like portage.util.libc ;-).

Well you might as well point me in this direction since I'm working on
this now.

> 
>>>> +  except ImportError:
>>>> +          writemsg_level("!!! Unable to import 
>>>> portage_c_convert_case\n!!!\n",
>>>> +                  level=logging.WARNING, noiselevel=-1)  
>>>
>>> Do we really want to warn verbosely about this? I think it'd be
>>> a pretty common case for people running the git checkout.  
>>
>> This should stay.  Its good to know that the module is not being
>> imported and silently falling back on the ctypes stuff.
>>
>> 1) its only going to happen in the rare occasion that you're using
>> something like a turkish locale and can't import the module.
> 
> Wrong. This happens before the check is done, so it will be output
> every time Portage is started, also with good locale.

Right I dropped it.

> 
>> 2) people who do a git checkout should add
>> PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
>> I can add something to testpath.  Users will have to be instructed to
>> run `./setup build` and then the script shoudl read something like this
>>
>> unamem=$(uname -m)
>>
>> pythonversion=$(python --version 2>&1 | cut -c8-)
>> pythonversion=${pythonversion%\.*}
>>
>> portagedir=$(dirname ${BASH_SOURCE[0]})
>>
>> export PATH="${portagedir}/bin:${PATH}"
>>
>> export
>> PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"
>>
>> export PYTHONWARNINGS=d,i::ImportWarning
>>
>>
>> BTW, the original code must have a bug in it.  It reads
>>
>> export PYTHONPATH=PYTHONPATH="$(dirname
>> $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"
>>
>> The double PYTHONPATH=PYTHONPATH= can't be right.
> 
> You are probably right. However:
> 
> 1. Since bin/ scripts are setting PYTHONPATH appropriately, you should
> probably update all of them as well to work out-of-the-box.
> 
> 2. Don't do bash hackery and guessing. Take correct paths directly out
> of Python modules (distutils should provide necessary means to get
> the defaults).
> 
> 3. Remember that something.cfg can be used to override those paths.
> Though if you use distutils well enough, it should be able to figure
> that out for oyu.

I didn't touched testpath because the only way I could get this working
right is to do a ./setup.py install --root= and I'm not sure we want to
totally install portage.  It kind of defeats the purpose of just running
portage from checkout.  If I find a better way I'll make another commit.

> 
>>>> +          libc_fn = find_library("c")
>>>> +          if libc_fn is None:
>>>> +                  return None
>>>> +          libc = LoadLibrary(libc_fn)
>>>> +          if libc is None:
>>>> +                  return None
>>>> +          libc_tolower = libc.tolower
>>>> +          libc_toupper = libc.toupper
>>>>  
>>>>    lc = list(range(ord('a'), ord('z')+1))
>>>>    uc = list(range(ord('A'), ord('Z')+1))
>>>> -  rlc = [libc.tolower(c) for c in uc]
>>>> -  ruc = [libc.toupper(c) for c in lc]
>>>> +  rlc = [libc_tolower(c) for c in uc]
>>>> +  ruc = [libc_toupper(c) for c in lc]
>>>>  
>>>>    if lc != rlc or uc != ruc:
>>>>            if silent:
>>>> @@ -62,7 +71,10 @@ def _check_locale(silent):
>>>>                    "as LC_CTYPE in make.conf.")
>>>>            msg = [l for l in textwrap.wrap(msg, 70)]
>>>>            msg.append("")
>>>> -          chars = lambda l: ''.join(chr(x) for x in l)
>>>> +          if sys.version_info.major >= 3:  
>>>
>>> Portage uses hexversion for comparisons. Please be consistent.  
>>
>> I see that.  Yuck, why use sys.hexversion?  version_info is easier to
>> read.  Anyhow, I'll change it because consistency is important.
> 
> 1. I think old versions of Python did not support named properties in
> sys.version_info, back when Portage was written.

I didn't need this because I found _unicode_decode() which does what I
want.  Thanks for the clue.  BTW, why are those functions/classes in
pym/portage/__init__.py?  All that code in there is just cluttering
__init__.py.  Shouldn't that stuff be pulled into a separate file and
imported cleanly?

> 
> 2. hexversion makes it easy to quickly check for specific versions like
> 3.2 (yep, Portage does that in some places because Python upstream
> can't keep API sane).
> 
>>>> +                  chars = lambda l: ''.join(chr(x) for x in l)
>>>> +          else:
>>>> +                  chars = lambda l: ''.join(chr(x).decode('utf-8', 
>>>> 'replace') for x in l)  
>>>
>>> This looks like an unrelated change. Was the original code buggy? If
>>> this is the case, then please fix it in a separate commit.  
>>
>> Yes the original code is buggy.  Either via the module or without, if x
>> is outside the ascii range, chr(x) fails to convert it properly and
>> throws an error.  This code was not tested:
>>
>> Traceback (most recent call last):
>>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
>> 124, in check_locale
>>     ret = _check_locale(silent)
>>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
>> 81, in _check_locale
>>     "  %s -> %s" % (chars(lc), chars(ruc)),
>>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
>> 74, in <lambda>
>>     chars = lambda l: ''.join(chr(x) for x in l)
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
>> ordinal not in range(128)
>>
>> I'll separate out that commit.
> 
> Thanks. Also please try if you can get a single code path for py2/py3.
> You may want to take a look at portage.util for decoding/encoding
> wrappers.
> 
>>>>            if uc != ruc:
>>>>                    msg.extend([
>>>>                            "  %s -> %s" % (chars(lc), chars(ruc)),
>>>> diff --git a/setup.py b/setup.py
>>>> index 25429bc..8b6b408 100755
>>>> --- a/setup.py
>>>> +++ b/setup.py
>>>> @@ -47,7 +47,11 @@ x_scripts = {
>>>>  # Dictionary custom modules written in C/C++ here.  The structure is
>>>>  #   key   = module name
>>>>  #   value = list of C/C++ source code, path relative to top source 
>>>> directory
>>>> -x_c_helpers = {}
>>>> +x_c_helpers = {
>>>> +  'portage_c_convert_case' : [
>>>> +          'src/portage_c_convert_case.c',
>>>> +  ],
>>>> +}
>>>>  
>>>>  class x_build(build):
>>>>    """ Build command with extra build_man call. """
>>>> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
>>>> new file mode 100644
>>>> index 0000000..f60b0c2
>>>> --- /dev/null
>>>> +++ b/src/portage_c_convert_case.c
>>>> @@ -0,0 +1,94 @@
>>>> +/* Copyright 2005-2016 Gentoo Foundation
>>>> + * Distributed under the terms of the GNU General Public License v2
>>>> + */
>>>> +
>>>> +#include <Python.h>
>>>> +#include <ctype.h>
>>>> +
>>>> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
>>>> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
>>>> +
>>>> +static PyMethodDef ConvertCaseMethods[] = {
>>>> +  {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case 
>>>> using system locale."},
>>>> +  {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case 
>>>> using system locale."},
>>>> +  {NULL, NULL, 0, NULL}  
>>>
>>> You should include stdlib.h or stdio.h for NULL.  
>>
>> yep.

yeah done.  Arferver pointed out that Python.h pulls in stdlib.h, but in
case Python.h changes structure in the future and drops it, we're covered.

>>
>>>   
>>>> +};
>>>> +
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +static struct PyModuleDef moduledef = {
>>>> +  PyModuleDef_HEAD_INIT,
>>>> +  "portage_c_convert_case",                                       /* 
>>>> m_name */
>>>> +  "Module for converting case using the system locale",           /* 
>>>> m_doc */
>>>> +  -1,                                                             /* 
>>>> m_size */
>>>> +  ConvertCaseMethods,                                             /* 
>>>> m_methods */
>>>> +  NULL,                                                           /* 
>>>> m_reload */
>>>> +  NULL,                                                           /* 
>>>> m_traverse */
>>>> +  NULL,                                                           /* 
>>>> m_clear */
>>>> +  NULL,                                                           /* 
>>>> m_free */
>>>> +};
>>>> +#endif
>>>> +
>>>> +static PyObject *ConvertCaseError;
>>>> +
>>>> +PyMODINIT_FUNC
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +PyInit_portage_c_convert_case(void)
>>>> +#else
>>>> +initportage_c_convert_case(void)
>>>> +#endif
>>>> +{
>>>> +  PyObject *m;
>>>> +
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +  m = PyModule_Create(&moduledef);
>>>> +#else
>>>> +  m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
>>>> +#endif
>>>> +
>>>> +  if (m == NULL)
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +          return NULL;
>>>> +#else
>>>> +          return;
>>>> +#endif
>>>> +
>>>> +  ConvertCaseError = 
>>>> PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
>>>> +  Py_INCREF(ConvertCaseError);
>>>> +  PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
>>>> +
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +  return m;
>>>> +#else
>>>> +  return;
>>>> +#endif
>>>> +}  
>>>
>>> To be honest, I think we'd be happier having two big #ifdefs for init
>>> funcs rather than one function with a lot of #ifdefs, and the common
>>> ConvertCaseError part in a separate static function.  
>>
>> I can move things around and reduce the #ifdefs, especially if we drop
>> the custom exception.
> 
> Thanks.

I think you'll agree its clean now.

> 
>>>> +
>>>> +
>>>> +static PyObject *
>>>> +portage_c_tolower(PyObject *self, PyObject *args)
>>>> +{
>>>> +  int c;
>>>> +
>>>> +  if (!PyArg_ParseTuple(args, "i", &c))
>>>> +  {
>>>> +          PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple 
>>>> failed");  
>>>
>>> From PyArg_ParseTuple() [1]:
>>>
>>> | on failure, it returns false and raises the appropriate exception.
>>>
>>> So I don't think you need or should use a custom exception here.
>>>
>>> [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple  
>>
>> I am aware, but then you loose the specificity of the error, whether it
>> occurred in _c_tolower or _c_toupper.  Its small, so I'll remove it
>> because it also cleans up the #ifdef spaghetti, but if we were to expand
>> this module for other libc functions then we should consider re-adding it.
> 
> Are you really sure about that? I'd rather see what kind of wrong
> argument has been passed rather than 'PyArg_ParseTuple failed' which
> tells me nothing.

I dropped the custom exception.  But if we expand portage.util.libc, we
should reintroduce something like ErrnoException to pass back errno upon
failure.  Eg. with unshare() you'll want EINVAL or EPERM thrown up
through the call stack.

> 
>>>> +          return NULL;
>>>> +  }
>>>> +
>>>> +  return Py_BuildValue("i", tolower(c));
>>>> +}
>>>> +
>>>> +
>>>> +static PyObject *
>>>> +portage_c_toupper(PyObject *self, PyObject *args)
>>>> +{
>>>> +  int c;
>>>> +
>>>> +  if (!PyArg_ParseTuple(args, "i", &c))
>>>> +  {
>>>> +          PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple 
>>>> failed");
>>>> +          return NULL;
>>>> +  }
>>>> +
>>>> +  return Py_BuildValue("i", toupper(c));
>>>> +}  
>>>
>>> Thanks a lot for your effort. This is really getting in shape.  
>>
>> I'm doing this because check_locale() broke portage on musl.  I'm not a
>> frequent contributor to portage so I'd ask the portage team to free free
>> style it up for me.  vapier does that to my code all the time :)
> 


-- 
Anthony G. Basile, Ph. D.
Chair of Information Technology
D'Youville College
Buffalo, NY 14201
(716) 829-8197

Reply via email to