Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
Tony Nelson wrote: For decoding it should be sufficient to use a unicode string of length 256. u\ufffd could be used for maps to undefined. Or the string might be shorter and byte values greater than the length of the string are treated as maps to undefined too. With Unicode using more than

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
Tony Nelson wrote: But is there really no way to say this fast in pure Python? The way a one-to-one byte mapping can be done with .translate()? Well, .translate isn't exactly pure Python. One-to-one between bytes and Unicode code points simply can't work. Just try all alternatives yourself

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Walter Dörwald
Am 05.10.2005 um 00:08 schrieb Martin v. Löwis: Walter Dörwald wrote: This array would have to be sparse, of course. For encoding yes, for decoding no. [...] For decoding it should be sufficient to use a unicode string of length 256. u\ufffd could be used for maps to undefined. Or

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: Another option would be to generate a big switch statement in C and let the compiler decide about the best data structure. I would try to avoid generating C code at all costs. Maintaining the build processes will just be a nightmare. We could automate this using

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread jepler
The function the module below, xlate.xlate, doesn't quite do what .decode does. (mostly that characters that don't exist are mapped to u+fffd always, instead of having the various behaviors avilable to .decode) It builds the fast decoding structure once per call, but when decoding 53kb of data

[Python-Dev] Python 2.5 and ast-branch

2005-10-05 Thread Nick Coghlan
Guido van Rossum wrote: On 10/4/05, Nick Coghlan [EMAIL PROTECTED] wrote: I was planning on looking at your patch too, but I was waiting for an answer from Guido about the fate of the ast-branch for Python 2.5. Given that we have patches for PEP 342 and PEP 343 against the trunk, but ast-branch

Re: [Python-Dev] Python 2.5 and ast-branch

2005-10-05 Thread Guido van Rossum
On 10/5/05, Nick Coghlan [EMAIL PROTECTED] wrote: Anyway, the question is: What do we want to do with ast-branch? Finish bringing it up to Python 2.4 equivalence, make it the HEAD, and only then implement the approved PEP's (308, 342, 343) that affect the compiler? Or implement the approved

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Hye-Shik Chang
On 10/5/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Of course, a C version could use the same approach as the unicodedatabase module: that of compressed lookup tables... http://aggregate.org/TechPub/lcpc2002.pdf genccodec.py anyone ? I had written a test codec for single byte

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Hye-Shik Chang wrote: On 10/5/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Of course, a C version could use the same approach as the unicodedatabase module: that of compressed lookup tables... http://aggregate.org/TechPub/lcpc2002.pdf genccodec.py anyone ? I had written a test

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
M.-A. Lemburg wrote: I would try to avoid generating C code at all costs. Maintaining the build processes will just be a nightmare. We could automate this using distutils; however I'm not sure whether this would then also work on Windows. It wouldn't. Regards, Martin

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I would try to avoid generating C code at all costs. Maintaining the build processes will just be a nightmare. We could automate this using distutils; however I'm not sure whether this would then also work on Windows. It wouldn't. Could

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: Walter Dörwald wrote: OK, here's a patch that implements this enhancement to PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939 Looks nice! Indeed (except for the choice of the map this character to undefined code point). Hye-Shik, could you please provide

Re: [Python-Dev] Python 2.5 and ast-branch

2005-10-05 Thread Brett Cannon
To answer Nick's email here, I didn't respond to that initial email because it seemed specifically directed at Guido and not me. On 10/5/05, Guido van Rossum [EMAIL PROTECTED] wrote: On 10/5/05, Nick Coghlan [EMAIL PROTECTED] wrote: Anyway, the question is: What do we want to do with

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: It wouldn't. Could you elaborate why not ? Using distutils on Windows is really easy... The current build process for Windows simply doesn't provide it. You expect to select Build/All from the menu (or some such), and expect all code to

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Trent Mick
[Martin v. Loewis wrote] Maybe it is possible to hack up a project file to invoke distutils as the build process, but no such project file is currently available, nor is it known whether it is possible to create one. This is essentially what the _ssl project does, no? It defers to

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Hye-Shik Chang
On 10/6/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Hye-Shik, could you please provide some timeit figures for the fastmap encoding ? (before applying Walter's patch, charmap decoder) % ./python Lib/timeit.py -s s='a'*53*1024; e='iso8859_10'; u=unicode(s, e) s.decode(e) 100 loops, best of 3:

[Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)

2005-10-05 Thread Phillip J. Eby
At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote: (anyone still thinking about removing the block stack?). I'm not any more. My thought was that it would be good for performance, by reducing the memory allocation overhead for frames enough to allow pymalloc to be used instead of the platform

Re: [Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)

2005-10-05 Thread Neal Norwitz
On 10/5/05, Phillip J. Eby [EMAIL PROTECTED] wrote: At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote: (anyone still thinking about removing the block stack?). I'm not any more. My thought was that it would be good for performance, by reducing the memory allocation overhead for frames enough