M == M.-A. Lemburg [EMAIL PROTECTED] writes:
M James Y Knight wrote:
Nice and simple.
M Albeit, too simple.
M The above approach would basically remove the possibility to
M easily create bytes() from literals in Py3k, since literals in
M Py3k create Unicode objects,
Guido van Rossum wrote:
If bytes support the buffer interface, we get another interesting
issue -- regular expressions over bytes. Brr.
We already have that:
import re, array
re.search('\2', array.array('B', [1, 2, 3, 4])).group()
array('B', [2])
Not sure whether to blame array
On Tue, 14 Feb 2006 12:31:07 -0700, Neil Schemenauer [EMAIL PROTECTED] wrote:
On Mon, Feb 13, 2006 at 08:07:49PM -0800, Guido van Rossum wrote:
On 2/13/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
\x80.encode('latin-1')
But in 2.5 we can't change that to return a bytes object without
On Tue, 14 Feb 2006 15:14:07 -0800, Guido van Rossum [EMAIL PROTECTED] wrote:
On 2/14/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
As Phillip guessed, I was indeed thinking about introducing bytes()
sooner than that, perhaps even in 2.5 (though I don't want anything
On 2/14/06, Neil Schemenauer wrote:
People could spell it bytes(s.encode('latin-1'))
Guido wrote:
At the cost of an extra copying step.
I asked:
... why not just add some smarts to the bytes constructor?
Guido wrote:
... the VM usually keeps an extra reference
on the stack so the refcount
Ron Adam [EMAIL PROTECTED] wrote:
Greg Ewing wrote:
Ron Adam wrote:
b = bytes(0L) - bytes([0,0,0,0])
No, bytes(0L) -- TypeError because 0L doesn't implement
the iterator protocol or the buffer interface.
It wouldn't need it if it was a direct C memory copy.
Yes it would.
On Wed, Feb 15, 2006 at 01:38:41PM -0500, Jim Jewett wrote:
On 2/14/06, Neil Schemenauer wrote:
People could spell it bytes(s.encode('latin-1'))
Guido wrote:
At the cost of an extra copying step.
I asked:
... why not just add some smarts to the bytes constructor?
Guido wrote:
Ron Adam wrote:
I was presuming it would be done in C code and it will just need a
pointer to the first byte, memchr(), and then read n bytes directly into
a new memory range via memcpy().
If the object supports the buffer interface, it can be
done that way. But if not, it would seem to
Greg Ewing wrote:
I think you don't understand what an encoding is. Unicode
strings don't *have* an encoding, because theyre not encoded!
Encoding is what happens when you go from a unicode string
to something else.
Ah.. ok, my mental picture was a bit off. I had this reversed somewhat.
On Tue, Feb 14, 2006, Guido van Rossum wrote:
Anyway, I'm now convinced that bytes should act as an array of ints,
where the ints are restricted to range(0, 256) but have type int.
range(0, 255)?
--
Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/
19. A language that
On Feb 15, 2006, at 6:35 PM, Aahz wrote:
On Tue, Feb 14, 2006, Guido van Rossum wrote:
Anyway, I'm now convinced that bytes should act as an array of ints,
where the ints are restricted to range(0, 256) but have type int.
range(0, 255)?
No, Guido was correct. range(0, 256) is [0, 1, 2,
On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good
enough:
abc.encode(latin-1)
'abc'
abc.decode(latin-1)
u'abc'
Guido van Rossum wrote:
I also wonder if having a b... literal would just add more confusion
-- bytes are not characters, but b... makes it appear as if they
are.
I'm inclined to agree. Bytes objects are more likely to be used
for things which are *not* characters -- if they're characters,
Guido van Rossum wrote:
There's also the consideration for APIs that, informally, accept
either a string or a sequence of objects.
My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of
problems.
Greg
Guido van Rossum wrote:
In general I've come to appreciate that there are two ways of
converting an object of type A to an object of type B: ask an A
instance to convert itself to a B, or ask the type B to create a new
instance from an A.
And the difference between the two isn't even always
On 2/14/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
Adam Olsen wrote:
What would that imply for repr()? To support eval(repr(x))
I don't think eval(repr(x)) needs to be supported for the bytes
type. However, if that is desirable, it should return something
like
bytes([1,2,3])
I'm
Greg Ewing [EMAIL PROTECTED] writes:
Guido van Rossum wrote:
There's also the consideration for APIs that, informally, accept
either a string or a sequence of objects.
My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of
problems.
Oh yes.
On Feb 14, 2006, at 6:35 AM, Greg Ewing wrote:
Barry Warsaw wrote:
This makes me think I want an unsigned byte type, which b[0] would
return.
Come to think of it, this is something I don't
remember seeing discussed. I've been thinking
that bytes[i] would return an integer, but is
the
On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:
Phillip J. Eby wrote:
I was just pointing out that since byte strings are bytes by
definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So, using latin-1 when converting a string to
At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:
Phillip J. Eby wrote:
I was just pointing out that since byte strings are bytes by
definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So,
James Y Knight wrote:
Kill the encoding argument, and you're left with:
Python2.X:
- bytes(bytes_object) - copy constructor
- bytes(str_object) - copy the bytes from the str to the bytes object
- bytes(sequence_of_ints) - make bytes with the values of the ints,
error on overflow
James Y Knight [EMAIL PROTECTED] wrote:
I like it, it makes sense. Unicode strings are simply not allowed as
arguments to the byte constructor. Thinking about it, why would it be
otherwise? And if you're mixing str-strings and unicode-strings, that
means the str-strings you're sometimes
Guido van Rossum wrote:
On 2/13/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
It'd be cruel and unusual punishment though to have to write
bytes(abc, Latin-1)
I propose that the default encoding (for basestring instances) ought
to be ascii just like everywhere else.
On Feb 14, 2006, at 11:47 AM, M.-A. Lemburg wrote:
The above approach would basically remove the possibility to easily
create bytes() from literals in Py3k, since literals in Py3k create
Unicode objects, e.g. bytes(123) would not work in Py3k.
That is true. And I think that is correct. There
On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote:
At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
I like it, it makes sense. Unicode strings are simply not allowed as
arguments to the byte constructor. Thinking about it, why would it be
otherwise? And if you're mixing str-strings and
On 2/14/06, Thomas Wouters [EMAIL PROTECTED] wrote:
On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good
enough:
abc.encode(latin-1)
On 2/14/06, Adam Olsen [EMAIL PROTECTED] wrote:
I'm starting to wonder, do we really need anything fancy? Wouldn't it
be sufficient to have a way to compactly store 8-bit integers?
In 2.x we could convert unicode like this:
bytes(ord(c) for c in uIt'sencode('utf-8'))
Yuck.
On 2/13/06, Barry Warsaw [EMAIL PROTECTED] wrote:
This makes me think I want an unsigned byte type, which b[0] would
return. In another thread I think someone mentioned something about
fixed width integral types, such that you could have an object that
was guaranteed to be 8-bits wide,
On 2/13/06, Adam Olsen [EMAIL PROTECTED] wrote:
What would that imply for repr()? To support eval(repr(x)) it would
have to produce whatever format the source code includes to begin
with.
I'm not sure that's a requirement. (I do think that in 2.x,
str(bytes(s)) == s should hold as long as
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
I didn't mean that it was the only purpose. In Python 2.x, practical code
has to sometimes deal with string-like objects. That is,
On 2/14/06, Barry Warsaw [EMAIL PROTECTED] wrote:
A related question: what would bytes([104, 101, 108, 108, 111, 8004])
return? An exception hopefully.
Absolutely.
I also think you'd want bytes([x
for x in some_bytes_object]) to return an object equal to the original.
You mean if
On 2/14/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X. That spelling would provide a way of ensuring the type
of the return value.
At the cost of an extra copying step.
[Guido]
You missed the part where I said
On Tue, 2006-02-14 at 15:13 -0800, Guido van Rossum wrote:
So I'm taking that the specific properties you want to model are the
overflow behavior, right? N-bit unsigned is defined as arithmethic mod
2**N; N-bit signed is a bit more tricky to define but similar. These
never overflow but
On 2/14/06, Neil Schemenauer nas at arctrix.com wrote:
People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X.
Guido wrote:
At the cost of an extra copying step.
That sounds like an implementation issue. If it is important
enough to matter, then why not just add
On 2/14/06, Jim Jewett [EMAIL PROTECTED] wrote:
On 2/14/06, Neil Schemenauer nas at arctrix.com wrote:
People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X.
Guido wrote:
At the cost of an extra copying step.
That sounds like an implementation issue. If it is
Guido van Rossum wrote:
The only remaining question is what if anything to do with an
encoding argment when the first argument is of type str...)
From what you said earlier about str in 2.x being
interpretable as a unicode string which contains
only ascii, it seems to me that if you say
Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
What would bytes(abc\xf0, latin-1) *mean*?
I'm saying that XXX would be the same encoding as you specified.
Greg Ewing wrote:
Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
What would bytes(abc\xf0, latin-1) *mean*?
I'm saying that XXX would be the same encoding
Ron Adam wrote:
My first impression and thoughts were: (and seems incorrect now)
bytes(object) - byte sequence of objects value
Basically a memory dump of objects value.
As I understand the current intentions, this is correct.
The bytes constructor would have two different
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects, with specific
explicit encodings, and those should suffice, at least initially.
I also wonder if having a
Guido van Rossum wrote:
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects, with specific
explicit encodings, and those should suffice, at least initially.
At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects, with specific
explicit encodings, and those should
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects,
Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes
At 10:55 PM 2/13/2006 +0100, M.-A. Lemburg wrote:
Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation.
Phillip J. Eby wrote:
Why not just have the constructor be:
bytes(initializer [,encoding])
Where initializer must be either an iterable of suitable integers, or a
unicode/string object. If the latter (i.e., it's a basestring), the
encoding argument would then be required. Then,
On 2/13/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
It'd be cruel and unusual punishment though to have to write
bytes(abc, Latin-1)
I propose that the default encoding (for basestring instances) ought
to be ascii just like everywhere else. (Meaning, it should
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
Actually, I thought we were talking about adding bytes() in 2.5.
I was.
However, now that you've brought this up, it actually makes perfect sense
to just use latin-1 as the effective encoding for both strings and
unicode. In Python 2.x,
At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
The conversion from Unicode to bytes is different in this
respect, since you are converting from a bigger type to
a smaller one. Choosing latin-1 as default for this
conversion would give you all 8 bits, instead of just 7
bits that ASCII provides.
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
The conversion from Unicode to bytes is different in this
respect, since you are converting from a bigger type to
a smaller one. Choosing latin-1 as default for this
conversion would give you
Phillip J. Eby wrote:
[snip..]
In fact, the 'encoding' argument seems useless in the case of str objects,
and it seems it should default to latin-1 for unicode objects. The only
-1 for having an implicit encode that behaves differently to other
implicit encodes/decodes that happen in
On 2/13/06, Michael Foord [EMAIL PROTECTED] wrote:
Phillip J. Eby wrote:
[snip..]
In fact, the 'encoding' argument seems useless in the case of str objects,
and it seems it should default to latin-1 for unicode objects. The only
-1 for having an implicit encode that behaves differently
On Mon, 2006-02-13 at 15:44 -0800, Guido van Rossum wrote:
The right way to look at this is, as Phillip says, to consider
conversion between str and bytes as not an encoding but a data type
change *only*.
That sounds right to me too.
-Barry
signature.asc
Description: This is a digitally
Guido van Rossum wrote:
On 2/13/06, Michael Foord [EMAIL PROTECTED] wrote:
Phillip J. Eby wrote:
[snip..]
In fact, the 'encoding' argument seems useless in the case of str objects,
and it seems it should default to latin-1 for unicode objects. The only
-1 for having an
At 03:23 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
The only
use I see for having an encoding for a 'str' would be to allow confirming
that the input string in fact is valid for that encoding. So,
bytes(some_str,'ascii') would be an
On 2/13/06, Michael Foord [EMAIL PROTECTED] wrote:
Sorry - I meant for the unicode to bytes case. A default encoding that
behaves differently to the current to implicit encodes/decodes would be
confusing IMHO.
And I am in agreement with you there (I think only Phillip argued otherwise).
I
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
I didn't mean that it was the only purpose. In Python 2.x, practical code
has to sometimes deal with string-like objects. That is, code that takes
either strings or unicode. If such code calls bytes(), it's going to want
to include an
On Feb 13, 2006, at 7:09 PM, Guido van Rossum wrote:
On 2/13/06, Michael Foord [EMAIL PROTECTED] wrote:
Sorry - I meant for the unicode to bytes case. A default encoding
that
behaves differently to the current to implicit encodes/decodes
would be
confusing IMHO.
And I am in agreement
On 2/13/06, James Y Knight [EMAIL PROTECTED] wrote:
So, in python2.X, you have:
- bytes(\x80), you get a bytestring with a single byte of value
0x80 (when no encoding is specified, and the object is a str, it
doesn't try to encode it at all).
- bytes(\x80, encoding=latin-1), you get an error,
Guido van Rossum [EMAIL PROTECTED] wrote:
In py3k, when the str object is eliminated, then what do you have?
Perhaps
- bytes(\x80), you get an error, encoding is required. There is no
such thing as default encoding anymore, as there's no str object.
- bytes(\x80, encoding=latin-1), you get a
On Monday 13 February 2006 21:52, Neil Schemenauer wrote:
Also, I think it would useful to introduce byte array literals at
the same time as the bytes object. That would allow people to use
byte arrays without having to get involved with all the silly string
encoding confusion.
bytes([0,
On 2/13/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
Guido van Rossum [EMAIL PROTECTED] wrote:
In py3k, when the str object is eliminated, then what do you have?
Perhaps
- bytes(\x80), you get an error, encoding is required. There is no
such thing as default encoding anymore, as there's
On Feb 13, 2006, at 7:29 PM, Guido van Rossum wrote:
There's one property that bytes, str and unicode all share: type(x[0])
== type(x), at least as long as len(x) = 1. This is perhaps the
ultimate test for string-ness.
But not perfect, since of course other containers can contain objects
of
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
I didn't mean that it was the only purpose. In Python 2.x, practical code
has to sometimes deal with string-like objects. That is, code that takes
either strings or unicode. If such
M.-A. Lemburg wrote:
We're talking about Py3k here: abc will be a Unicode string,
so why restrict the conversion to 7 bits when you can have 8 bits
without any conversion problems ?
YAGNI. If you have a need for byte string in source code, it will
typically be random bytes, which can be nicely
Phillip J. Eby wrote:
I was just pointing out that since byte strings are bytes by definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So, using latin-1 when converting a string to bytes
actually seems like the the One Obvious Way to do it.
Guido van Rossum wrote:
In py3k, when the str object is eliminated, then what do you have?
Perhaps
- bytes(\x80), you get an error, encoding is required. There is no
such thing as default encoding anymore, as there's no str object.
- bytes(\x80, encoding=latin-1), you get a bytestring with a
On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote:
bytes(map(ord, str_or_unicode))
In other words, without an encoding, bytes() should simply treat
str and
unicode objects *as if they were a sequence of integers*, and
produce an
error when an integer is out of range. This is a
Adam Olsen wrote:
What would that imply for repr()? To support eval(repr(x))
I don't think eval(repr(x)) needs to be supported for the bytes
type. However, if that is desirable, it should return something
like
bytes([1,2,3])
Regards,
Martin
___
On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum [EMAIL PROTECTED] wrote:
On Sat, 11 Feb 2006 05:08:09 + (UTC), Neil Schemenauer [EMAIL
PROTECTED] The backwards compatibility problems *seem* to be relatively
minor.
I only found one instance of breakage in the standard library.
70 matches
Mail list logo