I have been looking at how to build the _ldap module for python3.x. It's a pretty straightforward binary port except for one major headache with strings. So, here I'd like to propose and explain some API changes Python 3.x users. I'm focused on the _ldap module right now, but the (pure python) library modules will experience carry-on effects. These are my goals with the _ldap module:
1. Python-ldap compiled for 2.x should continue to use int for Message IDs and constants, but when compiled for 3.x will use longs. Next, and much harder to deal with is the loss of str(). There is heavy reliance of str() objects by _ldap to hold binary data, for attribute values. But, Python 3.0 does not have an 8-bit str(). It has what 2.x used to called unicode(). For 8-bit data we have a new type called bytes(). The issue is that conversion between the two is not automatic. For existing LDAP applications, I expect this to open up a world of porting pain. This is because lots of actual attribute values are ASCII and having the values available in a string type has been a very handy convenience. But, strictly, it is wrong. LDAP attribute values are binary OCTET STRINGs, and in the unicode-only text world of Python 3.x, applications that want text will need to decode these binary strings in accordance with the attribute's schema. This problem is something I'd like feedback on. Yet there is some good news. The text/binary problem appears to be restricted just to attribute values and to some authentication parameters like SASL passwords. Unaffected are attribute names, DNs and search filters. This is because they are transmitted as the ASN.1 LDAPString type, which is a UTF-8 encoded OCTET STRING. So it makes sense for the _ldap API to accept unicode strings for these. But, attribute values (OCTET STRING) surely must become bytes(). Using the bytes() type is going to cause much pain with potentially lots of "TypeError: Can't convert 'bytes' object to str implicitly" messages everywhere. But, it seems that this is part-and-parcel of porting to Python 3.x. If you really need strings, learn the encoding types of your attributes and call str(value, "UTF-8") or str(value, "ASCII") to convert them. So, my other proposed API changes are: 2. Python-ldap compiled for 2.x should continue to accept and produce strings as before. A library class that automatically converts attribute values of type bytes() into various python types via an attribute schema is possible, but at the _ldap level this is not necessary. It may even be better for an application tightly coupled to an LDAP schema to do this conversion itself. In summary, python-ldap should have no API change visible to 2.x clients. But, 3.x clients should need to use the bytes() type explicitly for attribute values. d |
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________ Python-LDAP-dev mailing list Python-LDAP-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/python-ldap-dev