Greg Ewing wrote:
Ron Adam wrote:
This uses syntax to determine the direction of encoding. It would be
easier and clearer to just require two arguments or a tuple.
u = unicode(b, 'encode', 'base64')
b = bytes(u, 'decode', 'base64')
The point of the exercise was to avoid
Stephen J. Turnbull wrote:
Doesn't that make base64 non-text by analogy to other look but don't
touch strings like a .gz or vmlinuz?
No, because I can take a piece of base64 encoded data
and use a text editor to manually paste it in with some
other text (e.g. a plain-text (not MIME) mail
Ron Adam wrote:
This would apply to codecs that
could return either bytes or strings, or strings or unicode, or bytes or
unicode.
I'd need to see some concrete examples of such codecs
before being convinced that they exist, or that they
couldn't just as well return a fixed type that you
Greg Ewing wrote:
Ron Adam wrote:
This would apply to codecs that
could return either bytes or strings, or strings or unicode, or bytes or
unicode.
I'd need to see some concrete examples of such codecs
before being convinced that they exist, or that they
couldn't just as well return a
Josiah Carlson wrote:
Greg Ewing [EMAIL PROTECTED] wrote:
u = unicode(b)
u = unicode(b, 'utf8')
b = bytes['utf8'](u)
u = unicode['base64'](b) # encoding
b = bytes(u, 'base64') # decoding
u2 = unicode['piglatin'](u1) # encoding
u1 = unicode(u2, 'piglatin') #
Ron Adam wrote:
Josiah Carlson wrote:
Greg Ewing [EMAIL PROTECTED] wrote:
u = unicode(b)
u = unicode(b, 'utf8')
b = bytes['utf8'](u)
u = unicode['base64'](b) # encoding
b = bytes(u, 'base64') # decoding
u2 = unicode['piglatin'](u1) # encoding
u1 =
Ron Adam wrote:
This uses syntax to determine the direction of encoding. It would be
easier and clearer to just require two arguments or a tuple.
u = unicode(b, 'encode', 'base64')
b = bytes(u, 'decode', 'base64')
The point of the exercise was to avoid using the terms
Stephen J. Turnbull wrote:
What you presumably meant was what would you consider the proper type
for (P)CDATA?
No, I mean the whole thing, including all the ... tags
etc. Like you see when you load an XML file into a text
editor. (BTW, doesn't the fact that you *can* load an
XML file into what
Greg == Greg Ewing [EMAIL PROTECTED] writes:
Greg (BTW, doesn't the fact that you *can* load an XML file into
Greg what we call a text editor say something?)
Why not answer that question for yourself, and then turn that answer
into a description of text semantics?
For me, it says that,
On Tue, 2006-02-28 at 15:23 -0800, Bill Janssen wrote:
Greg Ewing wrote:
Bill Janssen wrote:
bytes - base64 - text
text - de-base64 - bytes
It's nice to hear I'm not out of step with
the entire world on this. :-)
Well, I can certainly understand the bytes-base64-bytes side of
Bill Janssen wrote:
Greg Ewing wrote:
Bill Janssen wrote:
bytes - base64 - text
text - de-base64 - bytes
It's nice to hear I'm not out of step with
the entire world on this. :-)
Well, I can certainly understand the bytes-base64-bytes side of
thing too. The text produced is specified as
Nick Coghlan wrote:
All the unicode codecs, on the other hand, use encode to get from characters
to bytes and decode to get from bytes to characters.
So if bytes objects *did* have an encode method, it should still result in a
unicode object, just the same as a decode method does (because
Ron Adam writes:
While playing around with the example bytes class I noticed code reads
much better when I use methods called tounicode and tostring.
[...]
I'm not suggesting we start using to-type everywhere, just where it
might make things clearer over decode and encode.
+1
I always
Huh... just joining here but surely you don't mean a text string that
doesn't use every character available in a particular encoding is
really bytes... it's still a text string...
No, once it's in a particular encoding it's bytes, no longer text.
As you say,
Keep these two concepts separate
Chermside, Michael wrote:
... I will say that if there were no legacy I'd prefer the tounicode()
and tostring() (but shouldn't itbe 'tobytes()' instead?) names for Python 3.0.
Wouldn't 'tobytes' and 'totext' be better for 3.0 where text == unicode?
--
-- Scott David Daniels
[EMAIL PROTECTED]
I wrote:
... I will say that if there were no legacy I'd prefer the tounicode()
and tostring() (but shouldn't itbe 'tobytes()' instead?) names for Python 3.0.
Scott Daniels replied:
Wouldn't 'tobytes' and 'totext' be better for 3.0 where text == unicode?
Um... yes. Sorry, I'm not completely
Nick Coghlan wrote:
ascii_bytes = orig_bytes.decode(base64).encode(ascii)
orig_bytes = ascii_bytes.decode(ascii).encode(base64)
The only slightly odd aspect is that this inverts the conventional meaning of
base64 encoding and decoding,
-1. Whatever we do, we shouldn't design
Bill Janssen wrote:
No, once it's in a particular encoding it's bytes, no longer text.
The point at issue is whether the characters produced
by base64 are in a particular encoding. According to
my reading of the RFC, they're not.
--
Greg Ewing, Computer Science Dept,
Ron Adam wrote:
While playing around with the example bytes class I noticed code reads
much better when I use methods called tounicode and tostring.
b64ustring = b.tounicode('base64')
b = bytes(b64ustring, 'base64')
I don't like that, because it creates a dependency
(conceptually,
[My apologies Greg; I meant to send this to the whole list. I really
need a list-reply button in GMail. ]
On 3/1/06, Greg Ewing [EMAIL PROTECTED] wrote:
I don't like that, because it creates a dependency
(conceptually, at least) between the bytes type and
the unicode type.
I only find half of
Greg Ewing wrote:
Ron Adam wrote:
While playing around with the example bytes class I noticed code reads
much better when I use methods called tounicode and tostring.
b64ustring = b.tounicode('base64')
b = bytes(b64ustring, 'base64')
I don't like that, because it creates a
Greg Ewing [EMAIL PROTECTED] wrote:
u = unicode(b)
u = unicode(b, 'utf8')
b = bytes['utf8'](u)
u = unicode['base64'](b) # encoding
b = bytes(u, 'base64') # decoding
u2 = unicode['piglatin'](u1) # encoding
u1 = unicode(u2, 'piglatin') # decoding
Your provided
Bill Janssen wrote:
Well, I can certainly understand the bytes-base64-bytes side of
thing too. The text produced is specified as using a 65-character
subset of US-ASCII, so that's really bytes.
But it then goes on to say that these same characters
are also a subset of EBCDIC. So it seems to
Bill Janssen wrote:
I use it quite a bit for image processing (converting to and from the
data: URL form), and various checksum applications (converting SHA
into a string).
Aha! We have a customer!
For those cases, would you find it more convenient
for the result to be text or bytes in Py3k?
Ron == Ron Adam [EMAIL PROTECTED] writes:
Ron So, lets consider a codec and a coding as being two
Ron different things where a codec is a character sub set of
Ron unicode characters expressed in a native format. And a
Ron coding is *not* a subset of the unicode character set,
Stephen J. Turnbull wrote:
The reason that Python source code is text is that the primary
producers/consumers of Python source code are human beings, not
compilers
I disagree with primary -- I think human and computer
use of source code have equal importance. Because of the
fact that Python
Ron == Ron Adam [EMAIL PROTECTED] writes:
Ron We could call it transform or translate if needed.
You're still losing the directionality, which is my primary objection
to recode. The absence of directionality is precisely why recode
is used in that sense for i18n work.
There really isn't a
Greg == Greg Ewing [EMAIL PROTECTED] writes:
Greg Stephen J. Turnbull wrote:
No, base64 isn't a wire protocol. It's a family[...].
Greg Yes, and it's up to the programmer to choose those code
Greg units (i.e. pick an encoding for the characters) that will,
Greg in fact,
* The following reply is a rather longer than I intended explanation of
why codings (and how they differ) like 'rot' aren't the same thing as
pure unicode codecs and probably should be treated differently.
If you already understand that, then I suggest skipping this. But if
you like detailed
Stephen J. Turnbull wrote:
the kind of text for which Unicode was designed is normally produced
and consumed by people, who wll pt up w/ ll knds f nnsns. Base64
decoders will not put up with the same kinds of nonsense that people
will.
The Python compiler won't put up with that sort of
Stephen J. Turnbull wrote:
Please define character, and explain how its semantics map to
Python's unicode objects.
One of the 65 abstract entities referred to in the RFC
and represented in that RFC by certain visual glyphs.
There is a subset of the Unicode code points that
are conventionally
Greg == Greg Ewing [EMAIL PROTECTED] writes:
Greg Stephen J. Turnbull wrote:
What I advocate for Python is to require that the standard
base64 codec be defined only on bytes, and always produce
bytes.
Greg I don't understand that. It seems quite clear to me that
Greg
Stephen J. Turnbull wrote:
Base64 is a (family of) wire protocol(s). It's not clear to me that
it makes sense to say that the alphabets used by baseNN encodings
are composed of characters,
Take a look at
http://en.wikipedia.org/wiki/Base64
where it says
...base64 is a binary to text
On Feb 22, 2006, at 6:35 AM, Greg Ewing wrote:
I'm thinking of convenience, too. Keep in mind that in Py3k,
'unicode' will be called 'str' (or something equally neutral
like 'text') and you will rarely have to deal explicitly with
unicode codings, this being done mostly for you by the I/O
Greg Ewing [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
Efficiency is an implementation concern.
It is also a user concern, especially if inefficiency overruns memory
limits.
In Py3k, strings
which contain only ascii or latin-1 might be stored as
1 byte per character, in
Terry Reedy wrote:
Greg Ewing [EMAIL PROTECTED] wrote in message
Which is why I think that only *unicode* codings should be
available through the .encode and .decode interface. Or
alternatively there should be something more explicit like
.unicode_encode and .unicode_decode that is thus
Terry Reedy wrote:
Greg Ewing [EMAIL PROTECTED] wrote in message
Efficiency is an implementation concern.
It is also a user concern, especially if inefficiency overruns memory
limits.
Sure, but what I mean is that it's better to find what's
conceptually right and then look for an efficient
Ron Adam wrote:
While I prefer constructors with an explicit encode argument, and use a
recode() method for 'like to like' coding. Then the whole encode/decode
confusion goes away.
I'd be happy with that, too.
--
Greg Ewing, Computer Science Dept, +--+
James Y Knight wrote:
Some MIME sections
might have a base64 Content-Transfer-Encoding, others might be 8bit
encoded, others might be 7bit encoded, others might be quoted- printable
encoded.
I stand corrected -- in that situation you would have to encode
the characters before combining
Greg == Greg Ewing [EMAIL PROTECTED] writes:
Greg Stephen J. Turnbull wrote:
Base64 is a (family of) wire protocol(s). It's not clear to me
that it makes sense to say that the alphabets used by baseNN
encodings are composed of characters,
Greg Take a look at [this that
Ron == Ron Adam [EMAIL PROTECTED] writes:
Ron Terry Reedy wrote:
I prefer the shorter names and using recode, for instance, for
bytes to bytes.
Ron While I prefer constructors with an explicit encode argument,
Ron and use a recode() method for 'like to like' coding.
Stephen J. Turnbull wrote:
What I advocate for Python is to require that the standard base64
codec be defined only on bytes, and always produce bytes.
I don't understand that. It seems quite clear to me that
base64 encoding (in the general sense of encoding, not the
unicode sense) takes binary
On Sun, 2006-02-19 at 23:30 +0900, Stephen J. Turnbull wrote:
M == M.-A. Lemburg [EMAIL PROTECTED] writes:
M * for Unicode codecs the original form is Unicode, the derived
M form is, in most cases, a string
First of all, that's Martin's point!
Second, almost all Americans, a
Josiah Carlson wrote:
It doesn't seem strange to you to need to encode data twice to be able
to have a usable sequence of characters which can be embedded in an
effectively 7-bit email;
I'm talking about a 3.0 world where all strings are unicode
and the unicode - external coding is for the
Martin == Martin v Löwis [EMAIL PROTECTED] writes:
Martin Stephen J. Turnbull wrote:
Bengt The characters in b could be encoded in plain ascii, or
Bengt utf16le, you have to know.
Which base64 are you thinking about? Both RFC 3548 and RFC
2045 (MIME) specify subsets of
On Sat, 18 Feb 2006 23:33:15 +0100, Thomas Wouters [EMAIL PROTECTED] wrote:
On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:
[...]
- The return value for the non-unicode encodings depends on the value of
the encoding argument.
Not really: you'll always get a basestring
Josiah == Josiah Carlson [EMAIL PROTECTED] writes:
Josiah I try to internalize it by not thinking of strings as
Josiah encoded data, but as binary data, and unicode as text. I
Josiah then remind myself that unicode isn't native on-disk or
Josiah cross-network (which stores and
Stephen J. Turnbull wrote:
Martin For an example where base64 is *not* necessarily
Martin ASCII-encoded, see the binary data type in XML
Martin Schema. There, base64 is embedded into an XML document,
Martin and uses the encoding of the entire XML document. As a
Martin
Martin == Martin v Löwis [EMAIL PROTECTED] writes:
Martin Please do take a look. It is the only way: If you were to
Martin embed base64 *bytes* into character data content of an XML
Martin element, the resulting XML file might not be well-formed
Martin anymore (if the encoding of
On Feb 20, 2006, at 7:25 PM, Stephen J. Turnbull wrote:
Martin == Martin v Löwis [EMAIL PROTECTED] writes:
Martin Please do take a look. It is the only way: If you were to
Martin embed base64 *bytes* into character data content of an XML
Martin element, the resulting XML file
M.-A. Lemburg [EMAIL PROTECTED] writes:
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.
I forgot that:
Ian == Ian Bicking [EMAIL PROTECTED] writes:
Ian Encodings cover up eclectic interfaces, where those
Ian interfaces fit a basic pattern -- data in, data out.
Isn't filter the word you're looking for?
I think you've just made a very strong case that this is a slippery
slope that we
M == M.-A. Lemburg [EMAIL PROTECTED] writes:
M Martin v. Löwis wrote:
No. The reason to ban string.decode and bytes.encode is that it
confuses users.
M Instead of starting to ban everything that can potentially
M confuse a few users, we should educate those users and tell
M == M.-A. Lemburg [EMAIL PROTECTED] writes:
M The main reason is symmetry and the fact that strings and
M Unicode should be as similar as possible in order to simplify
M the task of moving from one to the other.
Those are perfectly compatible with Martin's suggestion.
M Still,
Josiah == Josiah Carlson [EMAIL PROTECTED] writes:
Josiah The question remains: is str.decode() returning a string
Josiah or unicode depending on the argument passed, when the
Josiah argument quite literally names the codec involved,
Josiah difficult to understand? I don't
Bob == Bob Ippolito [EMAIL PROTECTED] writes:
Bob On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
But you aren't always getting *unicode* text from the decoding
of bytes, and you may be encoding bytes *to* bytes:
Please note that I presumed that you can indeed assume that
Bengt == Bengt Richter [EMAIL PROTECTED] writes:
Bengt The characters in b could be encoded in plain ascii, or
Bengt utf16le, you have to know.
Which base64 are you thinking about? Both RFC 3548 and RFC 2045
(MIME) specify subsets of US-ASCII explicitly.
--
School of Systems and
Stephen J. Turnbull wrote:
BTW, what use cases do you have in mind for Unicode - Unicode
decoding?
I think rot13 falls into that category: it is a transformation
on text, not on bytes.
For other odd cases: base64 goes Unicode-bytes in the *decode*
direction, not in the encode direction. Some
Stephen J. Turnbull wrote:
Do you do any of the user education *about codec use* that you
recommend? The people I try to teach about coding invariably find it
difficult to understand. The problem is that the near-universal
intuition is that for human-usable text is pretty much anything *but
Stephen J. Turnbull wrote:
Bengt The characters in b could be encoded in plain ascii, or
Bengt utf16le, you have to know.
Which base64 are you thinking about? Both RFC 3548 and RFC 2045
(MIME) specify subsets of US-ASCII explicitly.
Unfortunately, it is ambiguous as to whether they
On Feb 19, 2006, at 10:55 AM, Martin v. Löwis wrote:
Stephen J. Turnbull wrote:
BTW, what use cases do you have in mind for Unicode - Unicode
decoding?
I think rot13 falls into that category: it is a transformation
on text, not on bytes.
The current implementation is a transformation on
Stephen J. Turnbull [EMAIL PROTECTED] wrote:
Josiah == Josiah Carlson [EMAIL PROTECTED] writes:
Josiah The question remains: is str.decode() returning a string
Josiah or unicode depending on the argument passed, when the
Josiah argument quite literally names the codec
Josiah Carlson wrote:
Bob Ippolito [EMAIL PROTECTED] wrote:
On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
Greg Ewing [EMAIL PROTECTED] wrote:
Stephen J. Turnbull wrote:
Guido == Guido van Rossum [EMAIL PROTECTED] writes:
Guido - b = bytes(t, enc); t = text(b, enc)
+1 The coding
Aahz wrote:
The problem is that they don't understand that Martin v. L?wis is not
Unicode -- once all strings are Unicode, this is guaranteed to work.
This specific call, yes. I don't think the problem will go away as long
as both encode and decode are available for both strings and byte
Ron Adam [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
Bengt Richter had a good idea with bytes.recode() for strictly bytes
transformations (and the equivalent for text), though it is ambiguous as
to the direction; are we encoding or decoding with bytes.recode()? In
my opinion, this is
Martin, v. Löwis wrote:
How are users confused?
Users do
py Martin v. Löwis.encode(utf-8)
Traceback (most recent call last):
File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
ordinal not in range(128)
because they want to convert the
On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.
No. The reason to ban string.decode and bytes.encode is that
it confuses
This posting is entirely tangential. Be warned.
Martin v. Löwis [EMAIL PROTECTED] writes:
It's worse than that. The return *type* depends on the *value* of
the argument. I think there is little precedence for that:
There's one extremely significant example where the *value* of
something
Josiah Carlson wrote:
Ron Adam [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
Bengt Richter had a good idea with bytes.recode() for strictly bytes
transformations (and the equivalent for text), though it is ambiguous as
to the direction; are we encoding or decoding with bytes.recode()? In
Thomas Wouters wrote:
On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've
On 2/18/06, Josiah Carlson [EMAIL PROTECTED] wrote:
Look at what we've currently got going for data transformations in the
standard library to see what these removals will do: base64 module,
binascii module, binhex module, uu module, ... Do we want or need to
add another top-level module for
On Sat, Feb 18, 2006, Ron Adam wrote:
I like the bytes.recode() idea a lot. +1
It seems to me it's a far more useful idea than encoding and decoding by
overloading and could do both and more. It has a lot of potential to be
an intermediate step for encoding as well as being used for many
Aahz wrote:
On Sat, Feb 18, 2006, Ron Adam wrote:
I like the bytes.recode() idea a lot. +1
It seems to me it's a far more useful idea than encoding and decoding by
overloading and could do both and more. It has a lot of potential to be
an intermediate step for encoding as well as being
M.-A. Lemburg wrote:
I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the .encode() and .decode() method *do* check the
Michael Hudson wrote:
There's one extremely significant example where the *value* of
something impacts on the type of something else: functions. The types
of everything involved in str([1]) and len([1]) are the same but the
results are different. This shows up in PyPy's type annotation; most
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the .encode() and .decode()
M.-A. Lemburg wrote:
True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.
I forgot that: what is the rationale for that restriction?
To assure that only
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.
I forgot that: what is the rationale for that restriction?
Ron Adam [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
[snip]
Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something?
This was just an example of one way that might work, but here are my
thoughts on
Aahz wrote:
On Sat, Feb 18, 2006, Ron Adam wrote:
I like the bytes.recode() idea a lot. +1
It seems to me it's a far more useful idea than encoding and decoding by
overloading and could do both and more. It has a lot of potential to be
an intermediate step for encoding as well as being
On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:
It's by no means a Perl attitude.
In your eyes, perhaps. It certainly feels that way to me (or I wouldn't have
said it :). Perl happens to be full of general constructs that were added
because they were easy to add, or they were
Josiah Carlson [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something?
Are we going to need to embed the direction in the encoding/decoding
name
Josiah Carlson wrote:
Ron Adam [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
[snip]
Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something?
This was just an example of one way that might work, but here
Ron Adam [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
Ron Adam [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
[snip]
Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something?
This was just an
Josiah Carlson wrote:
Ron Adam [EMAIL PROTECTED] wrote:
Except that ambiguates it even further.
Is encodings.tounicode() encoding, or decoding? According to everything
you have said so far, it would be decoding. But if I am decoding binary
data, why should it be spending any time as a
Josiah Carlson wrote:
I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
and likely a few others that the two of you may be arguing against
should stay as encodings, because strictly speaking, they are defined as
encodings of data. They may not be encodings of _unicode_
On 2/15/06, Guido van Rossum [EMAIL PROTECTED] wrote:
Actually users trying to figure out Unicode would probably be better served if bytes.encode() and text.decode() did not exist.[...]It would be better if the signature of text.encode() always returned a
bytes object. But why deny the bytes
On Feb 16, 2006, at 9:20 PM, Josiah Carlson wrote:
Greg Ewing [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
They may not be encodings of _unicode_ data,
But if they're not encodings of unicode data, what
business do they have being available through
someunicodestring.encode(...)?
I
Guido == Guido van Rossum [EMAIL PROTECTED] writes:
Guido I'd say there are two symmetric API flavors possible (t
Guido and b are text and bytes objects, respectively, where text
Guido is a string type, either str or unicode; enc is an encoding
Guido name):
Guido -
Martin v. Löwis wrote:
Josiah Carlson wrote:
I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
and likely a few others that the two of you may be arguing against
should stay as encodings, because strictly speaking, they are defined as
encodings of data. They may not
On Fri, 17 Feb 2006 00:33:49 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
[EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
and likely a few others that the two of you may be arguing against
should stay as encodings,
M.-A. Lemburg wrote:
Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.
No. The reason to ban string.decode and bytes.encode is that
it confuses users.
Regards,
Martin
Martin v. Löwis [EMAIL PROTECTED] wrote:
M.-A. Lemburg wrote:
Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.
No. The reason to ban string.decode and bytes.encode is
On Fri, 17 Feb 2006 21:35:25 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
[EMAIL PROTECTED] wrote:
M.-A. Lemburg wrote:
Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.
Josiah Carlson wrote:
How are users confused?
Users do
py Martin v. Löwis.encode(utf-8)
Traceback (most recent call last):
File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
ordinal not in range(128)
because they want to convert the string to
Martin v. Löwis wrote:
Users do
py Martin v. Löwis.encode(utf-8)
Traceback (most recent call last):
File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
ordinal not in range(128)
because they want to convert the string to Unicode, and they
Martin v. Löwis [EMAIL PROTECTED] wrote:
Josiah Carlson wrote:
How are users confused?
Users do
py Martin v. Löwis.encode(utf-8)
Traceback (most recent call last):
File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
ordinal not in
Ian Bicking wrote:
That str.encode(unicode_encoding) implicitly decodes strings seems like
a flaw in the unicode encodings, quite seperate from the existance of
str.encode. I for one really like s.encode('zlib').encode('base64') --
and if the zlib encoding raised an error when it was passed a
Josiah Carlson wrote:
If some users
can't understand this (passing different arguments to a function may
produce different output),
It's worse than that. The return *type* depends on the *value* of
the argument. I think there is little precedence for that: normally,
the return values depend on
1 - 100 of 118 matches
Mail list logo