On 03/29/2012 12:20 AM, Rob Crittenden wrote:
Jan Cholasta wrote:
On 29.2.2012 15:45, Rob Crittenden wrote:
Jan Cholasta wrote:
On 28.2.2012 18:58, Rob Crittenden wrote:
Jan Cholasta wrote:
On 28.2.2012 18:02, Petr Viktorin wrote:
On 02/28/2012 04:45 PM, Rob Crittenden wrote:
Petr Viktorin wrote:
On 02/28/2012 04:02 AM, Rob Crittenden wrote:
Petr Viktorin wrote:
On 02/27/2012 05:10 PM, Rob Crittenden wrote:
Rob Crittenden wrote:
Simo Sorce wrote:
On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
We are pretty trusting that the data coming out of LDAP
matches
its
schema but it is possible to stuff non-printable characters
into
most
attributes.

I've added a sanity checker to keep a value as a python str
type
(treated as binary internally). This will result in a base64
encoded
blob be returned to the client.

I don't like the idea of having arbitrary binary data where unicode
strings are expected. It might cause some unexpected errors (I have a
feeling that --addattr and/or --delattr and possibly some plugins
might
not handle this very well). Wouldn't it be better to just throw away
the
value if it's invalid and warn the user?

This isn't for user input, it is for data stored in LDAP. User's are
going to have no way to provide binary data to us unless they use the
API themselves in which case they have to follow our rules.

Well my point was that --addattr and --delattr cause an LDAP search for
the given attribute and plugins might get the result of a LDAP
search in
their post_callback and I'm not sure if they can cope with binary data.

It wouldn't be any different than if we had the value as a unicode.

Let's see what happens if the mail attribute of a user contains invalid
UTF-8 (ff 62 30 72 6b 65 64):

$ ipa user-find jdoe
--------------
1 user matched
--------------
User login: jdoe
First name: John
Last name: Doe
Home directory: /home/jdoe
Login shell: /bin/sh
Email address: /2IwcmtlZA==
UID: 526
GID: 526
Account disabled: False
Password: False
Kerberos keys available: False
----------------------------
Number of entries returned 1
----------------------------

$ ipa user-mod jdoe --addattr mail=j...@example.com
ipa: ERROR: an internal error has occurred

The internal error is:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line
302, in wsgi_execute
result = self.Command[name](*args, **options)
File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 438, in
__call__
ret = self.run(*args, **options)
File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 696, in
run
return self.execute(*args, **options)
File "/usr/lib/python2.7/site-packages/ipalib/plugins/baseldap.py", line
1217, in execute
ldap, dn, entry_attrs, attrs_list, *keys, **options
File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
532, in pre_callback
entry_attrs['mail'] = self.obj._normalize_email(entry_attrs['mail'])
File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
338, in _normalize_email
norm_email.append(m + u'@' + config['ipadefaultemaildomain'][0])
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0:
invalid start byte

$ ipa user-mod jdoe --delattr mail=/2IwcmtlZA==
ipa: ERROR: mail does not contain '/2IwcmtlZA=='

$ ipa user-mod jdoe --delattr mail=`echo 'ff 62 30 72 6b 65 64' | xxd -p
-r`
ipa: ERROR: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in
position 5: invalid start byte
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1242, in run
sys.exit(api.Backend.cli.run(argv))
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1024, in run
kw = self.parse(cmd, argv)
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1049, in
parse
return dict(self.parse_iter(cmd, kw))
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1058, in
parse_iter
yield (key, self.Backend.textui.decode(value))
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 136, in
decode
return value.decode(encoding)
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 5:
invalid start byte
ipa: ERROR: an internal error has occurred

I'm sure there is a lot more places in the code where things will break
when you feed them arbitrary data.


We treat the python type str as binary data. Anything that is a str gets
based64 encoded before json or xml-rpc transmission.

The type unicode is considered a "string" and goes in the clear.

We determine what this type should be not from the data but from the
schema. This is a big assumption. Hopefully this answer's Petr's point
as well.

We decided long ago that str means Binary and unicode means String. It
is a bit clumsy perhaps python handles it well. It will be more clear
when we switch to Python 3.0 and we'll have bytes and str instead as
types.

Well, this is all super-obvious and I'm not really sure why do you bring
it up here...

My concern is that with your approach, you can no longer be sure that an
attribute value is a valid unicode string. This creates a
maintainability burden, as you *always* have to keep in mind that you
first have to check whether the value is valid or not before touching
it. You can easily forget to include the check in new code and there is
a lot of places in old code that need to be changed because of this (as
you can see in the mail example above).

Let me quote John Dennis
(<http://www.redhat.com/archives/freeipa-devel/2012-January/msg00075.html>):


I'm very interested in this work. I too have recognized that we have a
very real structural problem with how we handle the types we use
internally, especially when it corresponds to an external type and
requires conversion. In fact we do a remarkably bad job in this area, we
have conversions which are done ad hoc all over the place. There is no
well established place for the conversions to occur and it's difficult
to know at any point in the code what type to expect.

Your patch adds yet another occurence of "it's difficult to know at any
point in the code what type to expect". IMO this is very bad and we
should not do this at all in new code.

I'm not sure if just dropping bad values (as I have suggested before) is
the best idea, but IMHO anything is better than letting these bad values
into the bowels of IPA.


We are trusting that the data in LDAP matches its schema. This is just
belt and suspenders verifying that it is the case.

Sure, but I still think we should allow any valid unicode data to come
from LDAP, not just what is valid in XML-RPC.

This won't affect data coming out of LDAP, only the way it is
transmitted to the client.

It will affect the data flowing through IPA, after we get it from LDAP,
before it is transmitted through XML-RPC.

In other words, I think it is more correct to do this:

LDAP -> check unicode validity -> (rest of IPA) -> check XML-RPC
validitity -> XML-RPC

than this:

LDAP -> check unicode validity -> check XML-RPC validity -> (rest of
IPA) -> XML-RPC

(Let's keep things isolated please.)


rob

Honza


I'm using a much narrower scope. I'm not trying to make it easy to
manage non-printable characters, just not blow things up if they exist.
Limiting to the XML-RPC supported set is for convenience, these are
unprintable characters in any context. This is just a two-fer.

Petr was right, I need to encode to unicode before doing this comparison
in order to compare invalid unicode characters so I moved that first.

I added a very basic unit test.

If you're wondering where this data might come from, two ways are via AD
sync and migration.

Yes, the user will be left in a situation where they'll need to use
--setattr or ldapmodify to manage the data in the field. The UI doesn't
show the value at all but instead shows [object Object] (no errors in
console, not sure where this is coming from). It is possible to
highlight this and insert a new value though.

"[object Object]" is an output of standard object's toString() method.

so when the data are binary they are converted to base64 and JSON looks like this:

var foo = {
   "__base64__": "FOOBAR"
}

foo.toString() //you get "[object Object]"


UI in most cases expects that it will receive simple value (string, number, boolean) or array of simple values or nothing, not an object with base64 string in it.

So I should probably file an UI ticket to handle such cases, right?


rob


_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


--
Petr Vobornik

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to