Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-24 Thread Floris Bruynooghe
On Sat, Jan 23, 2010 at 10:09:14PM +0100, Cesare Di Mauro wrote: Introducing C++ is a big step, also. Aside the problems it can bring on some platforms, it means that C++ can now be used by CPython developers. It doesn't make sense to force people use C for everything but the JIT part. In the

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Michael Foord
On 23 Jan 2010, at 07:53, Martin v. Löwis mar...@v.loewis.de wrote: [snip...] Yes, definitely. It is this very reasoning that caused Python 2.x to use ASCII as the default encoding (when mixing strings and unicode), and, for the entire lifetime of 2.x, has caused endless pain for developers,

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Stephen J. Turnbull
Michael Foord writes: This is why I'm keen that by *default* Python should honour the UTF8 signature when reading files; Unfortunately, your caveat about a lot of the time it will *seem* to work applies to this as well. The only way that honoring signatures really works is if Python

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Michael Foord
On 24/01/2010 14:23, Stephen J. Turnbull wrote: Michael Foord writes: This is why I'm keen that by *default* Python should honour the UTF8 signature when reading files; Unfortunately, your caveat about a lot of the time it will *seem* to work applies to this as well. The only way that

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Stephen J. Turnbull
Michael Foord writes: When reading text files the presence of the UTF-8 signature *almost invariably* means a UTF-8 encoding. Honouring this will almost always be better than using the wrong encoding. Of course there are caveats, but it will be a substantial improvement. Sure, that

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Antoine Pitrou
Stephen J. Turnbull stephen at xemacs.org writes: That's throwing the baby out with the bathwater. Very few practical applications that care about the input encoding are going to be willing to accept an output encoding that doesn't correspond to the input encoding in an appropriate way.

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Stephen J. Turnbull
Antoine Pitrou writes: Perhaps you are speaking with your emacs hat, where the purpose is to output to the same file that serves as input. No, I'm not wearing my Emacs hat. If I was, there would be no problem. You just use binary for most such purposes. Historically that was how even

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Antoine Pitrou
Stephen J. Turnbull stephen at xemacs.org writes: But it *does* determine the charset of ErrorDocuments displayed by Apache. Users are likely to get somewhat confused if the ErrorDocuments are in a different charset from your dynamic HTML. Why would they? The browser picks the encoding from

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-24 Thread Cesare Di Mauro
2010/1/24 Floris Bruynooghe floris.bruynoo...@gmail.com Introducing C++ is a big step, but I disagree that it means C++ should be allowed in the other CPython code. C++ can be problematic on more obscure platforms (certainly when static initialisers are used) and being able to build a python

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Martin v. Löwis
However it is likely to be often wrong, and where the user's locale specifies an encoding like CP1252 then it will result in silent corruption rather than an immediate exception. Why do you say that? Why do you think it will likely be often wrong? Most likely, encoding text files with cp1252

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Martin v. Löwis
So what is your naive programmer supposed to expect when writing a cat program? This may be a bit out of context - however, a simple cat program should open files in binary, and be done. (not sure whether the average naive programmer is able to grasp the notion of binary IO and to oppose to

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Michael Foord
On 24/01/2010 18:41, Martin v. Löwis wrote: However it is likely to be often wrong, and where the user's locale specifies an encoding like CP1252 then it will result in silent corruption rather than an immediate exception. Why do you say that? Why do you think it will likely be often

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Oleg Broytman
On Sun, Jan 24, 2010 at 07:45:20PM +0100, Martin v. L?wis wrote: This may be a bit out of context - however, a simple cat program should open files in binary, and be done. (not sure whether the average naive programmer is able to grasp the notion of binary IO and to oppose to text IO, and

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Martin v. Löwis
I concede that I have no better statistics on the matter than you do, but I think that's wishful thinking. It is quite common for pure output to be mixed with echoed input, for example. Even if a file is converted to another format (eg, restructured text to LaTeX), it's very common for the

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Antoine Pitrou
Oleg Broytman phd at phd.pp.ru writes: Depends on the kind of cat and especially on the ways of using it. If you ask cat to number lines (see manual for GNU cat) - what do lines mean for binary IO? b\n-separated chunks of data. See the docs:

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Alexander Belopolsky
On Sun, Jan 24, 2010 at 1:54 PM, Oleg Broytman p...@phd.pp.ru wrote: ..   Depends on the kind of cat and especially on the ways of using it. If you ask cat to number lines (see manual for GNU cat) - what do lines mean for binary IO? Maybe this is yet another reason why some kinds of cat are a

[Python-Dev] Python 2.5.5 Release Candidate 2

2010-01-24 Thread Martin v. Löwis
Subject: [ANN] Python 2.5.5 Release Candidate 2. On behalf of the Python development team and the Python community, I'm happy to announce the release candidate 2 of Python 2.5.5. This is a source-only release that only includes security fixes. The last full bug-fix release of Python 2.5 was

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Stephen J. Turnbull wrote: You just can't get away from the need for explicit management of codecs if you want a robust internationalized application. I don't object to giving users an easy way to get the behavior Michael proposes; it just

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Stephen J. Turnbull
Antoine Pitrou writes: Stephen J. Turnbull stephen at xemacs.org writes: But it *does* determine the charset of ErrorDocuments displayed by Apache. Users are likely to get somewhat confused if the ErrorDocuments are in a different charset from your dynamic HTML. Why would

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Stephen J. Turnbull
Martin v. Löwis writes: My bet is that the majority of Python applications written today do web stuff. In the web, input encoding and output encoding are fairly decorrelated - in particular for databases and files read from disk. Sure. Which means that programmers have to do a lot of

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-24 Thread Martin v. Löwis
Using any guessing based on the locale (which describes the codec used byt the user's console, but is completely uncorrelated to any particular file on the user's filesystem) No, it's not just the encoding of the console. It is also the encoding that text editors will use, in absence of a more