Tony Nelson wrote:
For decoding it should be sufficient to use a unicode string of
length 256. u\ufffd could be used for maps to undefined. Or the
string might be shorter and byte values greater than the length of
the string are treated as maps to undefined too.
With Unicode using more than
Tony Nelson wrote:
But is there really no way to say this fast in pure Python? The way a
one-to-one byte mapping can be done with .translate()?
Well, .translate isn't exactly pure Python. One-to-one between bytes
and Unicode code points simply can't work. Just try all alternatives
yourself
Am 05.10.2005 um 00:08 schrieb Martin v. Löwis:
Walter Dörwald wrote:
This array would have to be sparse, of course.
For encoding yes, for decoding no.
[...]
For decoding it should be sufficient to use a unicode string of
length 256. u\ufffd could be used for maps to undefined. Or
Martin v. Löwis wrote:
Another option would be to generate a big switch statement in C
and let the compiler decide about the best data structure.
I would try to avoid generating C code at all costs. Maintaining the
build processes will just be a nightmare.
We could automate this using
The function the module below, xlate.xlate, doesn't quite do what .decode
does. (mostly that characters that don't exist are mapped to u+fffd always,
instead of having the various behaviors avilable to .decode)
It builds the fast decoding structure once per call, but when decoding 53kb of
data
Guido van Rossum wrote:
On 10/4/05, Nick Coghlan [EMAIL PROTECTED] wrote:
I was planning on looking at your patch too, but I was waiting for an answer
from Guido about the fate of the ast-branch for Python 2.5. Given that we have
patches for PEP 342 and PEP 343 against the trunk, but ast-branch
On 10/5/05, Nick Coghlan [EMAIL PROTECTED] wrote:
Anyway, the question is: What do we want to do with ast-branch? Finish
bringing it up to Python 2.4 equivalence, make it the HEAD, and only then
implement the approved PEP's (308, 342, 343) that affect the compiler? Or
implement the approved
On 10/5/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Of course, a C version could use the same approach as
the unicodedatabase module: that of compressed lookup
tables...
http://aggregate.org/TechPub/lcpc2002.pdf
genccodec.py anyone ?
I had written a test codec for single byte
Hye-Shik Chang wrote:
On 10/5/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Of course, a C version could use the same approach as
the unicodedatabase module: that of compressed lookup
tables...
http://aggregate.org/TechPub/lcpc2002.pdf
genccodec.py anyone ?
I had written a test
M.-A. Lemburg wrote:
I would try to avoid generating C code at all costs. Maintaining the
build processes will just be a nightmare.
We could automate this using distutils; however I'm not sure
whether this would then also work on Windows.
It wouldn't.
Regards,
Martin
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I would try to avoid generating C code at all costs. Maintaining the
build processes will just be a nightmare.
We could automate this using distutils; however I'm not sure
whether this would then also work on Windows.
It wouldn't.
Could
Martin v. Löwis wrote:
Walter Dörwald wrote:
OK, here's a patch that implements this enhancement to
PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939
Looks nice!
Indeed (except for the choice of the map this character
to undefined code point).
Hye-Shik, could you please provide
To answer Nick's email here, I didn't respond to that initial email
because it seemed specifically directed at Guido and not me.
On 10/5/05, Guido van Rossum [EMAIL PROTECTED] wrote:
On 10/5/05, Nick Coghlan [EMAIL PROTECTED] wrote:
Anyway, the question is: What do we want to do with
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
It wouldn't.
Could you elaborate why not ? Using distutils on Windows is really
easy...
The current build process for Windows simply doesn't provide it.
You expect to select Build/All from the menu (or some such),
and expect all code to
[Martin v. Loewis wrote]
Maybe it is possible to hack up a project file to invoke distutils
as the build process, but no such project file is currently available,
nor is it known whether it is possible to create one.
This is essentially what the _ssl project does, no? It defers to
On 10/6/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Hye-Shik, could you please provide some timeit figures for
the fastmap encoding ?
(before applying Walter's patch, charmap decoder)
% ./python Lib/timeit.py -s s='a'*53*1024; e='iso8859_10';
u=unicode(s, e) s.decode(e)
100 loops, best of 3:
At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
(anyone still thinking about removing the block stack?).
I'm not any more. My thought was that it would be good for performance, by
reducing the memory allocation overhead for frames enough to allow pymalloc
to be used instead of the platform
On 10/5/05, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
(anyone still thinking about removing the block stack?).
I'm not any more. My thought was that it would be good for performance, by
reducing the memory allocation overhead for frames enough
18 matches
Mail list logo