Re: [Freeipa-devel] [PATCH] 971 detect binary LDAP data

Jan Cholasta Tue, 28 Feb 2012 09:54:38 -0800

On 28.2.2012 18:02, Petr Viktorin wrote:

On 02/28/2012 04:45 PM, Rob Crittenden wrote:

Petr Viktorin wrote:

On 02/28/2012 04:02 AM, Rob Crittenden wrote:

Petr Viktorin wrote:

On 02/27/2012 05:10 PM, Rob Crittenden wrote:

Rob Crittenden wrote:

Simo Sorce wrote:

On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:

We are pretty trusting that the data coming out of LDAP matches
its
schema but it is possible to stuff non-printable characters into
most
attributes.


I've added a sanity checker to keep a value as a python str type
(treated as binary internally). This will result in a base64
encoded
blob be returned to the client.

I don't like the idea of having arbitrary binary data where unicodestrings are expected. It might cause some unexpected errors (I have afeeling that --addattr and/or --delattr and possibly some plugins mightnot handle this very well). Wouldn't it be better to just throw away thevalue if it's invalid and warn the user?


Shouldn't you try to parse it as a unicode string and catch
TypeError to
know when to return it as binary ?

Simo.


What we do now is the equivalent of unicode(chr(0)) which returns
u'\x00' and is why we are failing now.

I believe there is a unicode category module, we might be able to
use
that if there is a category that defines non-printable characters.

rob


Like this:

import unicodedata

def contains_non_printable(val):
for c in val:
if unicodedata.category(unicode(c)) == 'Cc':
return True
return False

This wouldn't have the exclusion of tab, CR and LF like using ord()
but
is probably more correct.

rob

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


If you're protecting the XML-RPC calls, it'd probably be better to
look
at the XML spec directly: http://www.w3.org/TR/xml/#charsets

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

I'd say this is a good set for CLI as well.

And you can trap invalid UTF-8 sequences by catching the
UnicodeDecodeError from decode().

I don't think we should care about XML-RPC in LDAP-specific code at all.If you want to do some additional checks, do them in XML-RPC-specific code.


Replace my ord function with a regex that looks for invalid characters.
Now catching that exception too and leaving as a str type.

rob


You're matching an Unicode regex against a bytestring. It'd be better to
match after the decode.


The unicode regex was force of habit. I purposely do this comparison
before the decoding to save decoding something that is binary. Would it
be acceptable if I just remove the u'?

rob


No: re.search('[\uDFFF]', u'\uDFFF'.encode('utf-8')) --> None
Actually I'm not sure what encoding is used behind the scenes here; I
wouldn't recommend mixing Unicode and bytestrings even if it did work on
my machine.

Also, match() only looks at the beginning of the string; use search().
Sorry for not catching that earlier.


OCD notes:
- the "matches" variable name is misleading, there's either one match or
None.
- use `is not None` instead of `!= None` as per PEP8.
- The contains_non_printable method seems a bit excessive now that it
essentially just contains one call. Unless it's meant to be subclassed,
in which case the docstring should probably look different.


Honza

--
Jan Cholasta

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] [PATCH] 971 detect binary LDAP data

Reply via email to