Hi all,

I recently asked about the UCS2 / UCS4 binary compatibility issues
with Python on Guido's blog, and Guido suggested I continue the
discussion here:

http://www.artima.com/forums/flat.jsp?forum=106&thread=211430

The issue is that Python has a compile-time configuration setting
which changes its ABI. For example, on Ubuntu we have:

$ objdump -T /usr/bin/python|grep UCS
080ac3e0 g    DF .text  00000206  Base        PyUnicodeUCS4_EncodeUTF8
080b2810 g    DF .text  000000ba  Base        PyUnicodeUCS4_DecodeLatin1
080b6c20 g    DF .text  000002b3  Base        PyUnicodeUCS4_RSplit
...

Whereas on some other systems, including compiled-from-source Python, you get:

$ objdump -T python|grep UCS
080abc80 g    DF .text  00000201  Base        PyUnicodeUCS2_EncodeUTF8
080b32e0 g    DF .text  000000c7  Base        PyUnicodeUCS2_DecodeLatin1
080b6740 g    DF .text  000002b9  Base        PyUnicodeUCS2_RSplit

(note "UCS2" vs "UCS4")

This means that I can't distribute Python extensions as binaries. Any
extension built on Ubuntu may fail on some other system. I confess I
haven't tried this recently, but it has caused me trouble in the past.
I'd like to be sure it won't happen with Python 3.

I've hit this problem with both of the open source projects I work on;
the ROX desktop (http://rox.sf.net) and Zero Install
(http://0install.net).

ROX is a desktop environment. Most of our programs are written in
(pure) Python. Some, including ROX-Filer, are pure C. Sometimes it
would have been useful to combine the two: for example we could write
the pager applet in Python if it could use C to talk to the libwnck
library, or we could add Python scripting to the filer and gradually
migrate more of the code to Python.

Zero Install is a decentralised software installation system, itself
written entirely in Python, in which software authors publish
GPG-signed XML feed files on their websites. These feeds list versions
of their programs along with a cryptographic digest of each version's
contents (think GIT tree IDs here). This allows installing software
without needing root access, while still sharing libraries and
programs automatically between (mutually suspicious) users. Although
we don't need to use C extensions for the system itself, distributing
Python/C hybrid programs with it has been problematic.

Another group having similar problems is the Autopackage project:

 http://trac.autopackage.org/wiki/LinuxProblems#Python
 http://trac.autopackage.org/wiki/PackagingPythonApps
 http://plan99.net/~mike/blog/2006/05/24/python-unicode-abi/

Finally, the issue has also been brought up before on the Python lists:
   http://mail.python.org/pipermail/python-dev/2005-September/056837.html

Guido suggested:

 "Why don't you distribute a Python interpreter binary built with the
right options? Depending on users having installed the correct Python
version (especially if your users are not programmers) is asking for
trouble."

There are several problems for us with this approach:

- We have to maintain our own version of Python, including pushing out
security updates.

- We also have to maintain all the Python modules, in particular
python-gnome, in a similar way.

- Our users have to download Python twice whenever there's a new release.

- If some programs are using the distribution's Python and some are
using ours (libraries installed using Zero Install are only used by
software itself installed the same way; distribution packages aren't
affected), two copies of Python must be loaded into memory. This is
slow and wasteful of memory.

This is assuming all third-party code uses Zero Install for
distribution, so that only one extra version of Python is required.
For people distributing programs by other means, they would also have
to include their own copies of Python, leading to even more waste.

>From our point of view, it would be better if the format of strings
was an internal implementation detail. For most users, it doesn't
matter what the setting is, as long as the public interface doesn't
change! The cost of converting between formats is small, and in any
case most software outside of Python (the GNOME stack, for example)
uses UTF-8, so all strings have to be converted when going in or out
of Python anyway.

An alternative would be to default to UCS4, and give the option an
alarming  name such as --with-unicode-for-space-limited-devices or
something so that packagers don't mess with it.

Thanks,


-- 
Dr Thomas Leonard               http://rox.sourceforge.net
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to