On 2.12.2015 14:52, Petr Viktorin wrote:
On 12/02/2015 08:59 AM, Jan Cholasta wrote:
On 1.12.2015 12:26, Petr Viktorin wrote:
On 11/30/2015 08:59 AM, Jan Cholasta wrote:
On 25.11.2015 15:47, Petr Viktorin wrote:
On 11/25/2015 11:04 AM, Jan Cholasta wrote:
On 24.11.2015 17:21, Petr Viktorin wrote:
On 11/23/2015 10:50 AM, Jan Cholasta wrote:
On 23.11.2015 07:43, Jan Cholasta wrote:
On 19.11.2015 00:55, Petr Viktorin wrote:
On 11/03/2015 02:39 PM, Petr Viktorin wrote:
Python 3's strings are Unicode, so data coming to or leaving a
program needs to be decoded/encoded if it's to be handled as a
One of the boundaries where encoding is necessary is external
specifically, ipautil.run.
Unfortunately there's no one set of options that would work
for all
run() invocations, so I went through them all and specified the
stdin/stdout/stderr encoding where necessary. I've also
updated the
sites to make it clearer if the return values are used or not.
If an encoding is not set, run() will accept/return bytes. (This
is a
fail-safe setting, since it can't raise errors, and using bytes
strings are expected generally fails loudly in py3.)

Note that the changes are not effective under Python 2.

Could someone look at this patch?


1) I think a better approach would be to use str for
by default in both Python 2 and 3, i.e. do nothing in Python 2 and
encode/decode using locale.getpreferredencoding() in Python 3.
This is
consistent with sys.std{in,out,err} in both Python 2 and 3. Using
or different encoding should be opt-in.

Note that different encoding should be used only if the LC_ALL or
variables are overriden in the command's environment.

That would assume the output of *everything* run via ipautil.run
can be
decoded using the locale encoding. Any stray invalid byte would make
crash, even in cases where we don't care about the output. IPA calls
many weird tools to assume they all output text.

Such a stray invalid byte may bubble through our code and cause havoc
somewhere else, which is much harder to troubleshoot than a crash in
ipautil.run(), where you can see that it was a misbehaving command
caused the crash and exactly what command it was.

The invalid byte can't bubble through code, because if you don't
an encoding you get bytes back. You'll get a crash when you try
using it
as a string.

Invalid bytes *can* bubble through our code, into configuration files
and other external places. The fact that it has not caused much grief so
far suggests that it should be safe to use the locale encoding for most

Yes, "suggests" for "most". I'm still wary of using the locale encoding
for everything by default.

OK, let's keep bytes as the default, but I still think the locale
encoding should be the default when decoding output to text.

If we do subscribe to the statement that any command can output invalid
bytes at any time, I don't see a point in handling encoding in
ipautil.run() at all - just always return bytes and let the caller worry
about encoding and handling related errors.

Right, that's actually the basic idea.

However, the encoding is two lines of boilerplate code ("if six.PY3" and
the actual encode). I've turned this into a convenience keyword argument
to minimize the repetition.

If you always return bytes in both Python 2 and 3, you can always
.decode() the output, so there won't be any "if six.PY3". Bearing that
in mind, I don't see much difference between adding ",
stdout_encoding='ascii'" to the run() call or "stdout =
stdout.decode('ascii')" on the line below, except the latter is more
explicit and lets you handle decoding errors yourself.

locale.getpreferredencoding() is not used consistently in all software,
especially if it's not written in Python. For instance, wget won't
magically re-encode the data it fetches for us. It's better to
explicitly specify the encoding every time.

I would say it is used consistently in most software used by IPA. The
wget example is obviously not relevant to this discussion, since it's
returning external data.

IMO setting the encoding to 'ascii' without also setting LC_ALL=C in the
environment, as seen frequently in your patch, has even higher
probability of breaking something than using the locale encoding.

The patch should use 'ascii' with programs that don't output localized
messages: things like base64-encoded certs or the output of `ip`.
If you see  a particular call where that's not the case, it's a bug in
the patch, please point it out.

IMO it's better to be safe than sorry, so I would like to see either
LC_ALL=C set with every use of 'ascii' encoding, or use the locale
encoding instead of 'ascii'.

If we don't care about output somewhere, we should not capture it
at all.

Then people need to remember to put "capture_output=True" everywhere
(except that also disables logging of the output, so we'd need an
additional option).

capture_output=True is the default and capture_output=False should be
set where output is not needed.

"capture_output=True" being the default would mean that it's still
possible to leave "capture_output=False" out, but ignore the output. It
would make the wrong usage easier to type than the right one, which is
something to avoid.

I'm not sure I understand what are you trying to say here.

My point is that capture_output=True is currently the default (see for
yourself) and it is indeed possible to leave capture_output=False but
ignore the output. I believe this is wrong and that you should always
have to explicitly request the output to be captured.

OK, looks like I will need to refactor ipautil.run even for Python 2, then.

If this is just about changing the default and fixing all run() invocations, I can provide the patch myself, if that would work better for you.

I seem to have have problems coming up with a solution that would match
your expectations, so let me write the semantics as I understand you'd
like hem:

Would you be happy with the following signature?

def run(args, stdin=None, raiseonerr=True,
         nolog=(), env=None,
         capture_output=False, skip_output=False,
         cwd=None, runas=None, timeout=None, suplementary_groups=[],

Output and error would be always logged, except if skip_output is set.
But they would be only returned if capture_output/capture_stderr were set.

I would personally also rename capture_output to capture_stdout. If we change the default, we have to update its every use, so we might as well rename it. (Again, I can provide the patch.)

If encode_* is False, bytes are returned, by default
locale.getpreferredencoding() is used. (If the caller wants any other
encoding, it can do that by itself.)

I thought bytes should be the default after all? See my comment in the previous mail.

stdin it encoded using locale.getpreferredencoding() if it's unicode.

If we do encode/decode in run(), I think there should be a way to override the default encoding. My idea was to either accept/return only bytes and let the caller handle encoding themselves, or to handle encoding in run(), but be able to override the defaults.

Given we handle encoding in run(), I would imagine it like this:

    def run(args, stdin=None, raiseonerr=True, nolog=(), env=None,
            capture_stdout=False, skip_output=False, cwd=None,
            runas=None, timeout=None, suplementary_groups=[],
            capture_stderr=False, encode_stdout=False,
            encode_stderr=False, encoding=None):

i.e. nothing is captured by default, when stdout or stderr are captured they are returned as bytes by default, when stdout or stderr are returned as text they are decoded using the locale encoding by default, or the encoding can be explicitly specified.

Do you think this is feasible?

Also, in this case I'd like the function would return a namedtuple, so
we don't have to set unused variables all the time. But I can leave that
as is if you disagree.

Definitely, go for it.

Would that be OK?

Jan Cholasta

Manage your subscription for the Freeipa-devel mailing list:
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code

Reply via email to