Re: [Python-Dev] --enabled-shared broken on freebsd5?
I think this problem probably needs to move over to distutils-sig, as it doesn't seem to be specific to the way that Python itself uses distutils. distutils.command.build_ext tests for Py_ENABLE_SHARED on linux and solaris and automatically adds '.' to the library_dirs, and I suspect it just needs to do this on FreeBSD as well (adding bsd to the list of platforms for which this is performed solves the problem, but I don't pretend to know enough about either distutils or freebsd to determine if this is the correct solution). -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] --enabled-shared broken on freebsd5?
(This may occur on more platforms - I can test on more unix platforms if the consensus is this is an actual problem and I'm not just a nut) On freebsd5, if you do a simple ./configure --enable-shared in current (2.7) trunk, your python shared library will build properly, but all modules will fail to find the shared library and thus fail to build: gcc -shared build/temp.freebsd-5.3-RELEASE-i386-2.7/u1/Python/Python-2.7a1/Modules/_struct.o -L/u1/tmp/python2.7a1/lib -L/usr/local/lib -lpython2.7 -o build/lib.freebsd-5.3-RELEASE-i386-2.7/_struct.so /usr/bin/ld: cannot find -lpython2.7 building '_ctypes_test' extension ... This of course is because libpython2.7.so is in the current directory and not (yet) installed in /usr/local/lib. I've made a very simple fix for this problem that works, but at least to me smells a bit funny, which is to modify setup.py to add the following to detect_modules(): # If we did --enable-shared, we need to be able to find the library # we just built in order to build the modules. if platform == 'freebsd5': add_dir_to_list(self.compiler_obj.library_dirs, '.') Which brings me to a few questions: a) Does this seem like a real problem, or am I missing something obvious? b) Does this fix seem like the sensible thing to do? (it seems at least that we ought to check that the user configured --enable-shared and only set -L. in that case, if that's possible) Setting --enable-shared when you actually have a libpython2.7.so in /usr/local/lib (or whatever --prefix you've selected) is possibly even more dangerous, because it may succeed in linking against a differently-built library than what you intended. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] --enabled-shared broken on freebsd5?
On Wed, Jan 6, 2010 at 16:14, Nicholas Bastin nick.bas...@gmail.com wrote: This of course is because libpython2.7.so is in the current directory and not (yet) installed in /usr/local/lib. One minor correction - as you could see from the compile line, the actual --prefix in this case is /u1/tmp/python2.7a1, but the libraries obviously aren't installed there yet either. Perhaps a better fix than setting -L. would be to put the shared library in build/lib.freebsd-5.3-RELEASE-i386-2.7 and add that to the library path for the linker (the build creates this directory, but installs nothing in it). -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] --enabled-shared broken on freebsd5?
On Wed, Jan 6, 2010 at 17:21, Martin v. Löwis mar...@v.loewis.de wrote: b) Does this fix seem like the sensible thing to do? No. Linking in setup.py should use the same options as if the module was built as *shared* through Modules/Setup, which, IIUC, should use BLDLIBRARY. Thanks for that pointer, that makes much more sense. Indeed, BLDLIBRARY on FreeBSD* is set to '-L. -lpython$(VERSION)' if you set --enable-shared, but somehow that piece of information doesn't propagate into the module build. More investigation to be done... -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python developers are in demand
On 10/12/07, Guido van Rossum [EMAIL PROTECTED] wrote: I keep getting regular requests from people looking for Python coders (and this is in addition to Google asking me to hand over my contacts :-). This is good news because it suggests Python is on the uptake (always good to know). At the same time it is disturbing because apparently there aren't enough Python programmers out there. (At least none of them looking for work.) What's up with that? At least from my perspective, all the jobs are in web applications, and all the Python developers I know are traditional applications programmers, not web developers. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 10/3/07, Greg Ewing [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: For stack frames, such a registration is difficult to make efficient. Also very error-prone if you happen to miss one. Although maybe no more error-prone than getting the reference counting right. Maybe, but reference counting is really easy to debug if you screw it up. This is probably one of the primary benefits of the majority of memory management being executed in reference counting - it's deterministic and easy to debug. I'm not opposed to memory management being done entirely through garbage collection, but it would have to be vastly superior to the current system in both memory efficiency and performance. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] testing in a Python --without-threads build
Might expected skips instead be based on your current configuration instead of what someone statically decided what would be appropriate for your platform? Every new release I have to go through the 'unexpected skips' to determine that they're perfectly fine for how I configured python. It seems that we ought to provide a mechanism for querying python for how the build was configured (although for non-unittest cases, failing to import some modules is usually sufficient information - knowing why they fail probably doesn't matter) On 9/8/07, Martin v. Löwis [EMAIL PROTECTED] wrote: I can't seem to run the regression tests in a --without-threads build. Might be interesting to configure a buildbot this way to keep ourselves honest. Because regrtest.py was importing test_socket_ssl without catching the ImportError exception: If that is the reason you cannot run it, then it seems it works just fine. There is nothing wrong with tests getting skipped. So, is this an expected skip or not? No. IIUC, expected skips are a platform property. For your platform, support for threads is expected (whatever your platform is as log as it was built in this millenium). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/nick.bastin%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Subversion checkout hanging?
I've had to blast my windows machine, as one is apparently required to do on occasion, and I'm trying to set up subversion again. I saved my private key file, and I can use plink -T to connect and I get: ( success ( 1 2 ( ANONYMOUS EXTERNAL ) ( edit-pipeline ) ) ) and that seems correct, and jives with the FAQ at least. I've also edited my %APPDATA%/Subversion/config file, and I know it was the right one, because I screwed it up at first and it didn't work at all.. :-) However, now I'm just getting a hang when I execute: svn checkout svn+ssh://[EMAIL PROTECTED]/python/trunk I've only let it go for about 5 minutes so far, so maybe it's thinking about something, but I suspect it isn't... Anyone have any idea what I've done wrong? -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subversion checkout hanging?
On 3/6/07, Georg Brandl [EMAIL PROTECTED] wrote: You could try to do ssh -vv [EMAIL PROTECTED] and see if the debug messages mean anything to you. My problem is that SSH works fine if you just try to do that (well, with plink). It's subversion that doesn't seem to be working. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subversion checkout hanging?
I've fixed it. It appears that there was something wrong with Pageant, and removing my key and readding it solved the problem. The lack of any debugging info from subversion was very helpful in solving this problem.. :-) Thanks for the help from those who responded. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] splitext('.cshrc')
On 3/6/07, Phillip J. Eby [EMAIL PROTECTED] wrote: At 07:24 PM 3/6/2007 +0100, Martin v. Löwis wrote: given a list of file names, classify them for display (the way the Windows explorer works, and similar file managers). They use MIME databases and the like, and if they are unix-ish, they probably reject the current splitext implementation already as incorrect, and have work-arounds. I know I've written code like this that *depends* on the current behavior. It's *useful* to classify e.g. .svn directories or .*rc files by their extension, so I'm honestly baffled by the idea of wanting to treat such files as *not* having an extension (as opposed to a possibly-unrecognized one). My argument would be that the file is not 'unnamed', with an extension of 'cshrc'. The file is actually called 'cshrc', and the '.' is metadata that is attached to tell the shell to hide the file. Assuming that we want ot be ignorant of shell semantics (and I think we do), then the file is called '.cshrc', and it has no extension. The notion of an unnamed file with an extension I think would be very odd to most people. +1 to changing the behaviour to return .cshrc as the filename, with no extension. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unifying trace and profile
On 2/21/06, Robert Brewer [EMAIL PROTECTED] wrote: 1. Allow trace hooks to receive c_call, c_return, and c_exception events (like profile does). I can easily make this modification. You can also register the same bound method for trace and profile, which sortof eliminates this problem. 2. Allow profile hooks to receive line events (like trace does). You really don't want this in the general case. Line events make profiling *really* slow, and they're not that accurate (although many thanks to Armin last year for helping me make them much more accurate). I guess what you require is to be able to selectively turn on events, thus eliminating the notion of 'trace' or 'profile' entirely, but I don't have a good idea of how to implement that at least as efficiently as the current system at the moment - I'm sure it could be done, I just haven't put any thought into it. 3. Expose new sys.gettrace() and getprofile() methods, so trace and profile functions that want to play nice can call sys.settrace/setprofile(None) only if they are the current hook. Not a bad idea, although are you really running into this problem a lot? 4. Make the same move that sys.exitfunc - atexit made (from a single function to multiple functions via registration), so multiple tracers/profilers can play nice together. It seems very unlikely that you'll want to have a trace hook and profile hook installed at the same time, given the extreme unreliability this will introduce into the profiler. 5. Allow the core to filter on the event arg before hook(frame, event, arg) is called. What do you mean by this, exactly? How would you use this feature? 6. Unify tracing and profiling, which would remove a lot of redundant code in ceval and sysmodule and free up some space in the PyThreadState struct to boot. The more events you throw in profiling makes it slow, however. Line events, while a nice thing to have, theoretically, would probably make a profiler useless. If you want to create line-by-line timing data, we're going to have to look for a more efficient way (like sampling). 7. As if the above isn't enough of a dream, it would be nice to have a bytecode tracer, which didn't bother with the f_lineno logic in maybe_call_line_trace, but just called the hook on every instruction. I'm working on one, but given how much time I've had to work on my profiler in the last year, I'm not even going to guess when I'll get a real shot at looking at that. My long-term goal is to eliminate profiling and tracing from the core interpreter entirely and implement the functionality in such a way that they don't cost you when not in use (i.e., implement profilers and debuggers which poke into the process from the outside, rather than be supported natively through events). This isn't impossible, but it's difficult because of the large variety of platforms. I have access to most of them, but again, my time is hugely constrained right now for python development work. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
On 8/8/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Nicholas Bastin wrote: It's a mature product. I would hope that that would count for something. Sure. But so is subversion. I will then assume that you and I have different ideas of what 'mature' means. So I should then remove your offer to host a perforce installation, as you never made such an offer, right? Correct. . Yes. That's what this PEP is for. So I guess you are -1 on the PEP. Not completely. More like -0 at the moment. We need a better system, but I think we shouldn't just pick a system because it's the one the PEP writer preferred - there should be some sort of effort to test a few systems (including bug trackers). I know this is work, but this isn't just something we can change easily again later. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
On 8/4/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Nicholas Bastin wrote: Perforce is a commercial product, but it can be had for free for verified Open Source projects, which Python shouldn't have any problem with. There are other problems, like you have to renew the agreement every year, but it might be worth considering, given the fact that it's an excellent system. So we should consider it because it is an excellent system... I don't know what that means, in precise, day-to-day usage terms (i.e. what precisely would it do for us that, say, Subversion can't do). It's a mature product. I would hope that that would count for something. I've had enough corrupted subversion repositories that I'm not crazy about the thought of using it in a production system. I know I'm not the only person with this experience. Sure, you can keep backups, and not really lose any work, but we're moving over because we have uptime and availability problems, so lets not just create them again. I think anything but Subversion is ruled out because: - there is no offer to host that anywhere (for subversion, there is already svn.python.org) We could host a Perforce repository just as easily, I would think. Interesting offer. I'll add this to the PEP - who is we in this context? Uh, the Python community. Which is currently hosting a subversion repository, so it doesn't seem like a stretch to imagine that p4.python.org could exist just as easily. - there is no support for converting a CVS repository (for subversion, there is cvs2svn) I'd put $20 on the fact that cvs2svn will *not* work out of the box for converting the python repository. Just call it a hunch. You could have read the PEP before losing that money :-) It did work out of the box. Pardon me if I don't feel that I'd like to see a system in production for a few weeks before we declare victory. The problems with this kind of conversion can be very subtle, and very painful. I'm not saying we shouldn't do this, I'm just saying that we should take an appropriate measure of how much greener the grass really is on the other side, and how much work we're willing to put in to make it that way. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [C++-sig] GCC version compatibility
On 7/12/05, Christoph Ludwig [EMAIL PROTECTED] wrote: If distutils builds C++ extensions with the C compiler then I consider this a bug in distutils because it is unlikely to work. (Unless the compiler can figure out from the source file suffixes in the compilation step *and* some info in the object files in the linking step that it is supposed to act like a C++ compiler. None of the compilers I am familiar with does the latter.) distutils should rather look for a C++ compiler in the PATH or explicitly ask the user to specify the command that calls the C++ compiler. You practically always have to use --compiler with distutils when building C++ extensions anyhow, and even then it rarely does what I would consider 'The Right Thing(tm)'. The problem is the distutils core assumption that you want to build extension modules with the same compiler options that you built Python with, is in many cases the wrong thing to do for C++ extension modules, even if you built Python with --with-cxx. This is even worse on windows where the MSVC compiler, until very recently, was crap for C++, and you really needed to use another compiler for C++, but Python was always built using MSVC (unless you jumped through hoops of fire). The problem is that this is much more complicated than it seems - you can't just ask the user for the C++ compiler, you really need to provide an abstraction layer for all of the compiler and linker flags, so that a user could specify what those flags are for their compiler of choice. Of course, once you've done that, the user might as well have just written a new Compiler class for distutils, which wouldn't pay any attention to how Python was built (other than where Python.h is). -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 9, 2005, at 12:59 AM, Martin v. Löwis wrote: Wow, what an inane way of looking at it. I don't know what world you live in, but in my world, users read the configure options and suppose that they mean something. In fact, they *have* to go off on their own to assume something, because even the documentation you refer to above doesn't say what happens if they choose UCS-2 or UCS-4. A logical assumption would be that python would use those CEFs internally, and that would be incorrect. Certainly. That's why the documentation should be improved. Changing the option breaks existing packaging systems, and should not be done lightly. I'm perfectly happy to continue supporting --enable-unicode=ucs2, but not displaying it as an option. Is that acceptable to you? -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 8, 2005, at 5:15 AM, Martin v. Löwis wrote: 'configure takes an option --enable-unicode, with the possible values ucs2, ucs4, yes (equivalent to no argument), and no (equivalent to --disable-unicode)' *THIS* documentation would break. This documentation is factually correct at the moment (configure does indeed take these options), and people rely on them in automatic build processes. Changing configure options should not be taken lightly, even if they may result from a wrong mental model. By that rule, --with-suffix should be renamed to --enable-suffix, --with-doc-strings to --enable-doc-strings, and so on. However, the nitpicking that underlies the desire to rename the option should be ignored in favour of backwards compatibility. Changing the documentation that goes along with the option would be fine. That is exactly what I proposed originally, which you shot down. Please actually read the contents of my messages. What I said was change the configure option and related documentation. It provides more than minimum value - it provides the truth. No. It is just a command line option. It could be named --enable-quirk=(quork|quark), and would still select UTF-16. Command line options provide no truth - they don't even provide statements. Wow, what an inane way of looking at it. I don't know what world you live in, but in my world, users read the configure options and suppose that they mean something. In fact, they *have* to go off on their own to assume something, because even the documentation you refer to above doesn't say what happens if they choose UCS-2 or UCS-4. A logical assumption would be that python would use those CEFs internally, and that would be incorrect. With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start supporting the full Unicode ccs the same way it supports UCS-2. I can't understand what you mean by this. My point is that if you configure python to support UCS-2, then it SHOULD NOT support surrogate pairs. Supporting surrogate paris is the purvey of variable width encodings, and UCS-2 is not among them. So you suggest to renaming it to --enable-unicode=utf16, right? My point is that a Unicode type with UTF-16 would correctly support all assigned Unicode code points, which the current 2-byte implementation doesn't. So --enable-unicode=utf16 would *not* be the truth. The current implementation supports the UTF-16 CEF. i.e., it supports a variable width encoding form capable of representing all of the unicode space using surrogate pairs. Please point out a code point that the current 2 byte implementation does not support, either directly, or through the use of surrogate pairs. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 8, 2005, at 1:44 PM, Martin v. Löwis wrote: Shane Hathaway wrote: Fair enough. The original point is that the documentation is unclear about what a Py_UNICODE[] contains. I deduced that it contains either UCS2 or UCS4 and implemented accordingly. Not only did I guess wrong, but others will probably guess wrong too. Something in the docs needs to spell this out. Again, patches are welcome. I was opposed to Nick's proposed changes, since they explicitly said that you are not supposed to know what is in a Py_UNICODE. Integrating the essence of PEP 261 into the main documentation would be a worthwhile task. You can't possibly assume you know specifically what's in a Py_UNICODE in any given python installation. If someone thinks this statement is untrue, please explain why. I realize you might not *want* that to be true, but it is. Users are free to configure their python however they desire, and if that means --enable-unicode=ucs2 on RH9, then that is perfectly valid. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote: Nicholas Bastin wrote: --enable-unicode=ucs2 be replaced with: --enable-unicode=utf16 and the docs be updated to reflect more accurately the variance of the internal storage type. -1. This breaks existing documentation and usage, and provides only minimum value. Have you been missing this conversation? UTF-16 is *WHAT PYTHON CURRENTLY IMPLEMENTS*. The current documentation is flat out wrong. Breaking that isn't a big problem in my book. It provides more than minimum value - it provides the truth. With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start supporting the full Unicode ccs the same way it supports UCS-2. Individual surrogate values remain accessible, and supporting non-BMP characters is left to the application (with the exception of the UTF-8 codec). I can't understand what you mean by this. My point is that if you configure python to support UCS-2, then it SHOULD NOT support surrogate pairs. Supporting surrogate paris is the purvey of variable width encodings, and UCS-2 is not among them. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: However, I don't understand all the excitement about Py_UNICODE: if you don't like the way this Python typedef works, you are free to interface to Python using any of the supported encodings using PyUnicode_Encode() and PyUnicode_Decode(). I'm sure you'll find one that fits your needs and if not, you can even write your own codec and register it with Python, e.g. UTF-32 which we currently don't support ;-) My concerns about Py_UNICODE are completely separate from my frustration that the documentation is wrong about this type. It is much more important that the documentation be correct, first, and then we can discuss the reasons why it can be one of two values, rather than just a uniform value across all python implementations. This makes distributing binary extension modules hard. It has become clear to me that no one on this list gives a *%^ about people attempting to distribute binary extension modules, or they would have cared about this problem, so I'll just drop that point. However, somehow, what keeps getting lost in the mix is that --enable-unicode=ucs2 is a lie, and we should change what this configure option says. Martin seems to disagree with me, for reasons that I don't understand. I would be fine with calling the option utf16, or just 2 and 4, but not ucs2, as that means things that Python doesn't intend it to mean. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:25 AM, M.-A. Lemburg wrote: I don't see why you shouldn't use Py_UNICODE buffer directly. After all, the reason why we have that typedef is to make it possible to program against an abstract type - regardless of its size on the given platform. Because the encoding of that buffer appears to be different depending on the configure options. If that isn't true, then someone needs to change the doc, and the configure options. Right now, it seems *very* clear that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the configure help, and you can't use the buffer directly if the encoding is variable. However, you seem to be saying that this isn't true. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: You've got that wrong: Python let's you choose UCS-4 - UCS-2 is the default. No, that's not true. Python lets you choose UCS-4 or UCS-2. What the default is depends on your platform. If you run raw configure, some systems will choose UCS-4, and some will choose UCS-2. This is how the conversation came about in the first place - running ./configure on RHL9 gives you UCS-4. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc (Another Attempt)
After reading through the code and the comments in this thread, I propose the following in the documentation as the definition of Py_UNICODE: This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size or native encoding of this type on any given platform. The main point here is that extension developers can not safely slam Py_UNICODE (which it appeared was true when the documentation stated that it was always 16-bits). I don't propose that we put this information in the doc, but the possible internal representations are: 2-byte wchar_t or unsigned short encoded as UTF-16 4-byte wchar_t encoded as UTF-32 (UCS-4) If you do not explicitly set the configure option, you cannot guarantee which you will get. Python also does not normalize the byte order of unicode strings passed into it from C (via PyUnicode_EncodeUTF16, for example), so it is possible to have UTF-16LE and UTF-16BE strings in the system at the same time, which is a bit confusing. This may or may not be worth a mention in the doc (or a patch). -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:42 PM, James Y Knight wrote: On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote: If this is the case, then we're clearly misleading users. If the configure script says UCS-2, then as a user I would assume that surrogate pairs would *not* be encoded, because I chose UCS-2, and it doesn't support that. I would assume that any UTF-16 string I would read would be transcoded into the internal type (UCS-2), and information would be lost. If this is not the case, then what does the configure option mean? It means all the string operations treat strings as if they were UCS-2, but that in actuality, they are UTF-16. Same as the case in the windows APIs and Java. That is, all string operations are essentially broken, because they're operating on encoded bytes, not characters, but claim to be operating on characters. Well, this is a completely separate issue/problem. The internal representation is UTF-16, and should be stated as such. If the built-in methods actually don't work with surrogate pairs, then that should be fixed. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: Nicholas Bastin wrote: On May 6, 2005, at 3:42 PM, James Y Knight wrote: It means all the string operations treat strings as if they were UCS-2, but that in actuality, they are UTF-16. Same as the case in the windows APIs and Java. That is, all string operations are essentially broken, because they're operating on encoded bytes, not characters, but claim to be operating on characters. Well, this is a completely separate issue/problem. The internal representation is UTF-16, and should be stated as such. If the built-in methods actually don't work with surrogate pairs, then that should be fixed. Wait... are you saying a Py_UNICODE array contains either UTF-16 or UTF-32 characters, but never UCS-2? That's a big surprise to me. I may need to change my PyXPCOM patch to fit this new understanding. I tried hard to not care how Python encodes unicode characters, but details like this are important when combining two frameworks with different unicode APIs. Yes. Well, in as much as a large part of UTF-16 directly overlaps UCS-2, then sometimes unicode strings contain UCS-2 characters. However, characters which would not be legal in UCS-2 are still encoded properly in python, in UTF-16. And yes, I feel your pain, that's how I *got* into this position. Mapping from external unicode types is an important aspect of writing extension modules, and the documentation does not help people trying to do this. The fact that python's internal encoding is variable is a huge problem in and of itself, even if that was documented properly. This is why tools like Xerces and ICU will be happy to give you whatever form of unicode strings you want, but internally they always use UTF-16 - to avoid having to write two internal implementations of the same functionality. If you look up and down Objects/unicodeobject.c you'll see a fair amount of code written a couple of different ways (using #ifdef's) because of the variability in the internal representation. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 7:43 PM, Martin v. Löwis wrote: Nicholas Bastin wrote: If this is the case, then we're clearly misleading users. If the configure script says UCS-2, then as a user I would assume that surrogate pairs would *not* be encoded, because I chose UCS-2, and it doesn't support that. What do you mean by that? That the interpreter crashes if you try to store a low surrogate into a Py_UNICODE? What I mean is pretty clear. UCS-2 does *NOT* support surrogate pairs. If it did, it would be called UTF-16. If Python really supported UCS-2, then surrogate pairs from UTF-16 inputs would either get turned into two garbage characters, or the I couldn't transcode this UCS-2 code point (I don't remember which on that is off the top of my head). I would assume that any UTF-16 string I would read would be transcoded into the internal type (UCS-2), and information would be lost. If this is not the case, then what does the configure option mean? It tells you whether you have the two-octet form of the Universal Character Set, or the four-octet form. It would, if that were the case, but it's not. Setting UCS-2 in the configure script really means UTF-16, and as such, the documentation should reflect that. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 7:45 PM, Martin v. Löwis wrote: Nicholas Bastin wrote: Because the encoding of that buffer appears to be different depending on the configure options. What makes it appear so? sizeof(Py_UNICODE) changes when you change the option - does that, in your mind, mean that the encoding changes? Yes. Not only in my mind, but in the Python source code. If Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), otherwise the encoding is UTF-16 (*not* UCS-2). If that isn't true, then someone needs to change the doc, and the configure options. Right now, it seems *very* clear that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the configure help, and you can't use the buffer directly if the encoding is variable. However, you seem to be saying that this isn't true. It's a compile-time option (as all configure options). So at run-time, it isn't variable. What I mean by 'variable' is that you can't make any assumption as to what the size will be in any given python when you're writing (and building) an extension module. This breaks binary compatibility of extensions modules on the same platform and same version of python across interpreters which may have been built with different configure options. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 8:25 PM, Martin v. Löwis wrote: Nicholas Bastin wrote: Yes. Not only in my mind, but in the Python source code. If Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), otherwise the encoding is UTF-16 (*not* UCS-2). I see. Some people equate encoding with encoding scheme; neither UTF-32 nor UTF-16 is an encoding scheme. You were That's not true. UTF-16 and UTF-32 are both CES and CEF (although this is not true of UTF-16LE and BE). UTF-32 is a fixed-width encoding form within a code space of (0..10) and UTF-16 is a variable-width encoding form which provides a mix of one of two 16-bit code units in the code space of (0..). However, you are perhaps right to point out that people should be more explicit as to which they are referring to. UCS-2, however, is only a CEF, and thus I thought it was obvious that I was referring to UTF-16 as a CEF. I would point anyone who is confused as this point to Unicode Technical Report #17 on the Character Encoding Model, which is much more clear than trying to piece together the relevant parts out of the entire standard. In any event, Python's use of the term UCS-2 is incorrect. I quote from the TR: The UCS-2 encoding form, which is associated with ISO/IEC 10646 and can only express characters in the BMP, is a fixed-width encoding form. immediately followed by: In contrast, UTF-16 uses either one or two code units and is able to cover the entire code space of Unicode. If Python is capable of representing the entire code space of Unicode when you choose --unicode=ucs2, then that is a bug. It either should not be called UCS-2, or the interpreter should be bound by the limitations of the UCS-2 CEF. What I mean by 'variable' is that you can't make any assumption as to what the size will be in any given python when you're writing (and building) an extension module. This breaks binary compatibility of extensions modules on the same platform and same version of python across interpreters which may have been built with different configure options. True. The breakage will be quite obvious, in most cases: the module fails to load because not only sizeof(Py_UNICODE) changes, but also the names of all symbols change. Yes, but the important question here is why would we want that? Why doesn't Python just have *one* internal representation of a Unicode character? Having more than one possible definition just creates problems, and provides no value. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 8:11 PM, Martin v. Löwis wrote: Nicholas Bastin wrote: Well, this is a completely separate issue/problem. The internal representation is UTF-16, and should be stated as such. If the built-in methods actually don't work with surrogate pairs, then that should be fixed. Yes to the former, no to the latter. PEP 261 specifies what should and shouldn't work. This PEP has several textual errors and ambiguities (which, admittedly, may have been a necessary state given the unicode standard in 2001). However, putting that aside, I would recommend that: --enable-unicode=ucs2 be replaced with: --enable-unicode=utf16 and the docs be updated to reflect more accurately the variance of the internal storage type. I would also like the community to strongly consider standardizing on a single internal representation, but I will leave that fight for another day. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote: Nicholas Bastin wrote: This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size of this type on any given platform. But people want to know Is Python's Unicode 16-bit or 32-bit? So the documentation should explicitly say it depends. The important piece of information is that it is not guaranteed to be a particular one of those sizes. Once you can't guarantee the size, no one really cares what size it is. The documentation should discourage developers from attempting to manipulate Py_UNICODE directly, which, other than trivia, is the only reason why someone would care what size the internal representation is. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_UNICODE madness
On May 4, 2005, at 4:39 AM, M.-A. Lemburg wrote: At the very least, if we can't guarantee the internal representation, then the PyUnicode_FromUnicode API needs to go away, and be replaced with something capable of transcoding various unicode inputs into the internal python representation. We have PyUnicode_Decode() for that. PyUnicode_FromUnicode is useful and meant for working directly on Py_UNICODE buffers. Is this API documented anywhere? (It's not in the Unicode Object section of the API doc). Also, this is quite inefficient if the source data is in UTF-16, because it appears that I'll have to transcode my data to utf-8 before I can pass it to this function, but I guess I'll have to live with that. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 4, 2005, at 1:02 PM, Michael Hudson wrote: Nicholas Bastin [EMAIL PROTECTED] writes: The current documentation for Py_UNICODE states: This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short. I propose changing this to: This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. On platforms where wchar_t is available, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. This just isn't true. Have you read ./configure --help recently? Ok, so the above statement is true if the user does not set --enable-unicode=ucs[24] (was reading the whar_t test in configure.in, and not the generated configure help). Alternatively, we shouldn't talk about the size at all, and just leave the first and last sentences: This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size of this type on any given platform. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_UNICODE madness
On May 3, 2005, at 6:44 PM, Guido van Rossum wrote: I think that documentation is wrong; AFAIK Py_UNICODE has always been allowed to be either 16 or 32 bits, and the source code goes through great lengths to make sure that you get a link error if you try to combine extensions built with different assumptions about its size. That makes PyUnicode_FromUnicode() a lot less useful. Well, really, not useful at all. You might suggest that PyUnicode_FromWideChar is more useful, but that's only true on platforms that support wchar_t. Is there no universally supported way of moving buffers of unicode data (as common data types, like unsigned short, etc.) into Python from C? -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PyCallable_Check redeclaration
Why is PyCallable_Check declared in both object.h and abstract.h? It appears that it's been this way for quite some time (exists in both 2.3.4 and 2.4.1). -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode byte order mark decoding
On Apr 7, 2005, at 5:07 AM, M.-A. Lemburg wrote: The current implementation of the utf-16 codecs makes for some irritating gymnastics to write the BOM into the file before reading it if it contains no BOM, which seems quite like a bug in the codec. The codec writes a BOM in the first call to .write() - it doesn't write a BOM before reading from the file. Yes, see, I read a *lot* of UTF-16 that comes from other sources. It's not a matter of writing with python and reading with python. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode byte order mark decoding
On Apr 7, 2005, at 11:35 AM, M.-A. Lemburg wrote: Ok, but I don't really follow you here: you are suggesting to relax the current UTF-16 behavior and to start defaulting to UTF-16-BE if no BOM is present - that's most likely going to cause more problems that it seems to solve: namely complete garbage if the data turns out to be UTF-16-LE encoded and, what's worse, enters the application undetected. The crux of my argument is that the spec declares that UTF-16 without a BOM is BE. If the file is encoded in UTF-16LE and it doesn't have a BOM, it doesn't deserve to be processed correctly. That being said, treating it as UTF-16BE if it's LE will result in a lot of invalid code points, so it shouldn't be non-obvious that something has gone wrong. If you do have UTF-16 without a BOM mark it's much better to let a short function analyze the text by reading for first few bytes of the file and then make an educated guess based on the findings. You can then process the file using one of the other codecs UTF-16-LE or -BE. This is about what we do now - we catch UnicodeError and then add a BOM to the file, and read it again. We know our files are UTF-16BE if they don't have a BOM, as the files are written by code which observes the spec. We can't use UTF-16BE all the time, because sometimes they're UTF-16LE, and in those cases the BOM is set. It would be nice if you could optionally specify that the codec would assume UTF-16BE if no BOM was present, and not raise UnicodeError in that case, which would preserve the current behaviour as well as allow users' to ask for behaviour which conforms to the standard. I'm not saying that you can't work around the issue now, what I'm saying is that you shouldn't *have* to - I think there is a reasonable expectation that the UTF-16 codec conforms to the spec, and if you wanted it to do something else, it is those users who should be forced to come up with a workaround. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode byte order mark decoding
On Apr 5, 2005, at 6:19 AM, M.-A. Lemburg wrote: Note that the UTF-16 codec is strict w/r to the presence of the BOM mark: you get a UnicodeError if a stream does not start with a BOM mark. For the UTF-8-SIG codec, this should probably be relaxed to not require the BOM. I've actually been confused about this point for quite some time now, but never had a chance to bring it up. I do not understand why UnicodeError should be raised if there is no BOM. I know that PEP-100 says: 'utf-16': 16-bit variable length encoding (little/big endian) and: Note: 'utf-16' should be implemented by using and requiring byte order marks (BOM) for file input/output. But this appears to be in error, at least in the current unicode standard. 'utf-16', as defined by the unicode standard, is big-endian in the absence of a BOM: --- 3.10.D42: UTF-16 encoding scheme: ... * The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian. --- The current implementation of the utf-16 codecs makes for some irritating gymnastics to write the BOM into the file before reading it if it contains no BOM, which seems quite like a bug in the codec. I allow for the possibility that this was ambiguous in the standard when the PEP was written, but it is certainly not ambiguous now. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SWT PyCon Sprint?
On Mar 10, 2005, at 11:00 AM, Phillip J. Eby wrote: At 01:38 AM 3/10/05 -0500, Nicholas Bastin wrote: I realize that this is exceedingly late in the game, but is anybody interested in doing a Write-Python-Bindings-for-SWT sprint? It's been brought up before in various places, and PyCon seems the likely place to get enough concentrated knowledge to actually get it kicked off and somewhat working... I'm certainly interested in the concept in general, though I'm curious whether the planned approach is a GCJ/SWIG wrapper, or a javaclass (bytecode translation)+ctypes dynamic approach. I'm somewhat more interested in the latter approach, as I find C++ a bit of a pain with respect to buildability. I'm open to either approach. I don't know a lot about JNI, so I was hoping somebody would come along for the ride who could answer certain questions about how SWT is implemented. A third option would be to grovel over SWT and implement an identical functionality in Python (pure-python plus ctypes), and make a mirror implementation, rather than a wrapper. What approach we take depends largely on who shows up and what they feel comfortable with. An additional complication is that SWT is a different package on each platform, so it's not so much port SWT to Python as port SWT-windows to Python, port SWT-Mac to Python, etc. The API is identical for each platform, however, so depending on the level at which you wrapped it, this is only a build problem. (I assume that you're talking about porting to a JVM-less CPython extension, since if you were to leave it in Java you could just use Jython or one of the binary Python-Java bridges to access SWT as-is.) Yes, JVM-less CPython extension would be the plan. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] SWT PyCon Sprint?
I realize that this is exceedingly late in the game, but is anybody interested in doing a Write-Python-Bindings-for-SWT sprint? It's been brought up before in various places, and PyCon seems the likely place to get enough concentrated knowledge to actually get it kicked off and somewhat working... -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com