Hi all, I recently asked about the UCS2 / UCS4 binary compatibility issues with Python on Guido's blog, and Guido suggested I continue the discussion here:
http://www.artima.com/forums/flat.jsp?forum=106&thread=211430 The issue is that Python has a compile-time configuration setting which changes its ABI. For example, on Ubuntu we have: $ objdump -T /usr/bin/python|grep UCS 080ac3e0 g DF .text 00000206 Base PyUnicodeUCS4_EncodeUTF8 080b2810 g DF .text 000000ba Base PyUnicodeUCS4_DecodeLatin1 080b6c20 g DF .text 000002b3 Base PyUnicodeUCS4_RSplit ... Whereas on some other systems, including compiled-from-source Python, you get: $ objdump -T python|grep UCS 080abc80 g DF .text 00000201 Base PyUnicodeUCS2_EncodeUTF8 080b32e0 g DF .text 000000c7 Base PyUnicodeUCS2_DecodeLatin1 080b6740 g DF .text 000002b9 Base PyUnicodeUCS2_RSplit (note "UCS2" vs "UCS4") This means that I can't distribute Python extensions as binaries. Any extension built on Ubuntu may fail on some other system. I confess I haven't tried this recently, but it has caused me trouble in the past. I'd like to be sure it won't happen with Python 3. I've hit this problem with both of the open source projects I work on; the ROX desktop (http://rox.sf.net) and Zero Install (http://0install.net). ROX is a desktop environment. Most of our programs are written in (pure) Python. Some, including ROX-Filer, are pure C. Sometimes it would have been useful to combine the two: for example we could write the pager applet in Python if it could use C to talk to the libwnck library, or we could add Python scripting to the filer and gradually migrate more of the code to Python. Zero Install is a decentralised software installation system, itself written entirely in Python, in which software authors publish GPG-signed XML feed files on their websites. These feeds list versions of their programs along with a cryptographic digest of each version's contents (think GIT tree IDs here). This allows installing software without needing root access, while still sharing libraries and programs automatically between (mutually suspicious) users. Although we don't need to use C extensions for the system itself, distributing Python/C hybrid programs with it has been problematic. Another group having similar problems is the Autopackage project: http://trac.autopackage.org/wiki/LinuxProblems#Python http://trac.autopackage.org/wiki/PackagingPythonApps http://plan99.net/~mike/blog/2006/05/24/python-unicode-abi/ Finally, the issue has also been brought up before on the Python lists: http://mail.python.org/pipermail/python-dev/2005-September/056837.html Guido suggested: "Why don't you distribute a Python interpreter binary built with the right options? Depending on users having installed the correct Python version (especially if your users are not programmers) is asking for trouble." There are several problems for us with this approach: - We have to maintain our own version of Python, including pushing out security updates. - We also have to maintain all the Python modules, in particular python-gnome, in a similar way. - Our users have to download Python twice whenever there's a new release. - If some programs are using the distribution's Python and some are using ours (libraries installed using Zero Install are only used by software itself installed the same way; distribution packages aren't affected), two copies of Python must be loaded into memory. This is slow and wasteful of memory. This is assuming all third-party code uses Zero Install for distribution, so that only one extra version of Python is required. For people distributing programs by other means, they would also have to include their own copies of Python, leading to even more waste. >From our point of view, it would be better if the format of strings was an internal implementation detail. For most users, it doesn't matter what the setting is, as long as the public interface doesn't change! The cost of converting between formats is small, and in any case most software outside of Python (the GNOME stack, for example) uses UTF-8, so all strings have to be converted when going in or out of Python anyway. An alternative would be to default to UCS4, and give the option an alarming name such as --with-unicode-for-space-limited-devices or something so that packagers don't mess with it. Thanks, -- Dr Thomas Leonard http://rox.sourceforge.net GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com