On 28.2.2012 18:02, Petr Viktorin wrote:
On 02/28/2012 04:45 PM, Rob Crittenden wrote:
Petr Viktorin wrote:
On 02/28/2012 04:02 AM, Rob Crittenden wrote:
Petr Viktorin wrote:
On 02/27/2012 05:10 PM, Rob Crittenden wrote:
Rob Crittenden wrote:
Simo Sorce wrote:
On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
We are pretty trusting that the data coming out of LDAP matches
schema but it is possible to stuff non-printable characters into

I've added a sanity checker to keep a value as a python str type
(treated as binary internally). This will result in a base64
blob be returned to the client.

I don't like the idea of having arbitrary binary data where unicode strings are expected. It might cause some unexpected errors (I have a feeling that --addattr and/or --delattr and possibly some plugins might not handle this very well). Wouldn't it be better to just throw away the value if it's invalid and warn the user?

Shouldn't you try to parse it as a unicode string and catch
TypeError to
know when to return it as binary ?


What we do now is the equivalent of unicode(chr(0)) which returns
u'\x00' and is why we are failing now.

I believe there is a unicode category module, we might be able to
that if there is a category that defines non-printable characters.


Like this:

import unicodedata

def contains_non_printable(val):
for c in val:
if unicodedata.category(unicode(c)) == 'Cc':
return True
return False

This wouldn't have the exclusion of tab, CR and LF like using ord()
is probably more correct.


Freeipa-devel mailing list

If you're protecting the XML-RPC calls, it'd probably be better to
at the XML spec directly: http://www.w3.org/TR/xml/#charsets

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |

I'd say this is a good set for CLI as well.

And you can trap invalid UTF-8 sequences by catching the
UnicodeDecodeError from decode().

I don't think we should care about XML-RPC in LDAP-specific code at all. If you want to do some additional checks, do them in XML-RPC-specific code.

Replace my ord function with a regex that looks for invalid characters.
Now catching that exception too and leaving as a str type.


You're matching an Unicode regex against a bytestring. It'd be better to
match after the decode.

The unicode regex was force of habit. I purposely do this comparison
before the decoding to save decoding something that is binary. Would it
be acceptable if I just remove the u'?


No: re.search('[\uDFFF]', u'\uDFFF'.encode('utf-8')) --> None
Actually I'm not sure what encoding is used behind the scenes here; I
wouldn't recommend mixing Unicode and bytestrings even if it did work on
my machine.

Also, match() only looks at the beginning of the string; use search().
Sorry for not catching that earlier.

OCD notes:
- the "matches" variable name is misleading, there's either one match or
- use `is not None` instead of `!= None` as per PEP8.
- The contains_non_printable method seems a bit excessive now that it
essentially just contains one call. Unless it's meant to be subclassed,
in which case the docstring should probably look different.


Jan Cholasta

Freeipa-devel mailing list

Reply via email to