On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good
enough:
abc.encode(latin-1)
'abc'
abc.decode(latin-1)
u'abc'
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
It's the consequences: nobody complains about tacking const on to a
former honest-to-God char * argument that was in fact not modified,
because that's not only helpful for C++ programmers, it's _harmless_
for all programmers. For example, nobody
On 2/13/06, Fred L. Drake, Jr. [EMAIL PROTECTED] wrote:
On Monday 13 February 2006 10:03, Georg Brandl wrote:
The above docs are from August 2005 while docs.python.org/dev is current.
Shouldn't the old docs be removed?
I'm afraid I've generally been too busy to chime in much on this
Guido van Rossum wrote:
[snip..]
In py3k, when the str object is eliminated, then what do you have?
Perhaps
- bytes(\x80), you get an error, encoding is required. There is no
such thing as default encoding anymore, as there's no str object.
- bytes(\x80, encoding=latin-1), you get a bytestring
Smith wrote:
computing the bin boundaries for a histogram
where bins are a width of 0.1:
for i in range(20):
... if (i*.1==i/10.)(nice(i*.1)==nice(i/10.)):
... print i,repr(i*.1),repr(i/10.),i*.1,i/10.
I don't see how that has any relevance to the way bin boundaries
would be used in
Guido van Rossum wrote:
I also wonder if having a b... literal would just add more confusion
-- bytes are not characters, but b... makes it appear as if they
are.
I'm inclined to agree. Bytes objects are more likely to be used
for things which are *not* characters -- if they're characters,
Guido van Rossum wrote:
There's also the consideration for APIs that, informally, accept
either a string or a sequence of objects.
My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of
problems.
Greg
Guido van Rossum wrote:
In general I've come to appreciate that there are two ways of
converting an object of type A to an object of type B: ask an A
instance to convert itself to a B, or ask the type B to create a new
instance from an A.
And the difference between the two isn't even always
Guido van Rossum wrote:
On 2/10/06, Mark Russell [EMAIL PROTECTED] wrote:
On 10 Feb 2006, at 12:45, Nick Coghlan wrote:
An alternative would be to call it __discrete__, as that is the key
characteristic of an indexing type - it consists of a sequence of discrete
values that can be
On 2/14/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
Adam Olsen wrote:
What would that imply for repr()? To support eval(repr(x))
I don't think eval(repr(x)) needs to be supported for the bytes
type. However, if that is desirable, it should return something
like
bytes([1,2,3])
I'm
Thanks to all for a rather insightful discussion, it's always fun to
learn that after 28 years of C programming the language still has
little corners that I know absolutely nothing about:-)
Practically speaking, though, I've adopted MAL's solution for the
time being:
/* The keyword array
On 2/14/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
It's the consequences: nobody complains about tacking const on to a
former honest-to-God char * argument that was in fact not modified,
because that's not only helpful for C++ programmers, it's
Greg Ewing [EMAIL PROTECTED] writes:
Guido van Rossum wrote:
There's also the consideration for APIs that, informally, accept
either a string or a sequence of objects.
My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of
problems.
Oh yes.
On 2/14/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
Jeremy Hylton wrote:
The compiler in question is gcc and the warning can be turned off with
-Wno-write-strings. I think we'd be better off leaving that option
on, though. This warning will help me find places where I'm passing a
On Feb 14, 2006, at 6:35 AM, Greg Ewing wrote:
Barry Warsaw wrote:
This makes me think I want an unsigned byte type, which b[0] would
return.
Come to think of it, this is something I don't
remember seeing discussed. I've been thinking
that bytes[i] would return an integer, but is
the
On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:
Phillip J. Eby wrote:
I was just pointing out that since byte strings are bytes by
definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So, using latin-1 when converting a string to
At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:
Phillip J. Eby wrote:
I was just pointing out that since byte strings are bytes by
definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So,
James Y Knight wrote:
Kill the encoding argument, and you're left with:
Python2.X:
- bytes(bytes_object) - copy constructor
- bytes(str_object) - copy the bytes from the str to the bytes object
- bytes(sequence_of_ints) - make bytes with the values of the ints,
error on overflow
However I do dislike the name nice() - there is already a nice() in the
os module with a fairly well understood function. But I'm sure some
Presumably it would be located somewhere like the math module.
For sure, but let's avoid as many name clashes as we can.
Python is very good at managing
James Y Knight [EMAIL PROTECTED] wrote:
I like it, it makes sense. Unicode strings are simply not allowed as
arguments to the byte constructor. Thinking about it, why would it be
otherwise? And if you're mixing str-strings and unicode-strings, that
means the str-strings you're sometimes
On 2/12/06, Alan Gauld [EMAIL PROTECTED] wrote:
However I do dislike the name nice() - there is already a nice() in the
os module with a fairly well understood function. But I'm sure some
Presumably it would be located somewhere like the math module.
For sure, but let's avoid as many name
Guido van Rossum wrote:
On 2/13/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
It'd be cruel and unusual punishment though to have to write
bytes(abc, Latin-1)
I propose that the default encoding (for basestring instances) ought
to be ascii just like everywhere else.
It doesn't seem to me that math.nice has an obvious meaning.
Regards,
Michael
On 2/14/06, Crutcher Dunnavant [EMAIL PROTECTED] wrote:
On 2/12/06, Alan Gauld [EMAIL PROTECTED] wrote:
However I do dislike the name nice() - there is already a nice() in the
os module with a fairly well
On Feb 14, 2006, at 11:47 AM, M.-A. Lemburg wrote:
The above approach would basically remove the possibility to easily
create bytes() from literals in Py3k, since literals in Py3k create
Unicode objects, e.g. bytes(123) would not work in Py3k.
That is true. And I think that is correct. There
On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote:
At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
I like it, it makes sense. Unicode strings are simply not allowed as
arguments to the byte constructor. Thinking about it, why would it be
otherwise? And if you're mixing str-strings and
On 2/14/06, Fuzzyman [EMAIL PROTECTED] wrote:
In Python 3K, when the string data-type has gone,
Technically it won't be gone; str will mean what it already means in
Jython and IronPython (for which CPython uses unicode in 2.x).
what will
``open(filename).read()`` return ?
Since you didn't
On 2/13/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
I'm actually opposed to bdist_egg, from a conceptual point of view.
I think it is wrong if Python creates its own packaging format
(just as it was wrong that Java created jar files - but they are
without deployment procedures even today).
I
(Disclaimer: I'm not currently promoting the addition of bdist_egg or any
egg-specific features for the 2.5 timeframe, but neither am I
opposed. This message is just to clarify a few points and questions under
discussion, not to advocate a particular outcome. If you read this and
think you
Guido van Rossum wrote:
what will
``open(filename).read()`` return ?
Since you didn't specify an open mode, it'll open it as a text file
using some default encoding (or perhaps it can guess the encoding from
file metadata -- this is all OS specific). So it'll return a string.
If you
On 2/14/06, Michael Walter [EMAIL PROTECTED] wrote:
It doesn't seem to me that math.nice has an obvious meaning.
I don't disagree, I think math.nice is a terrible name. I was
objecting to the desire to try to come up with interesting, different
names in every module namespace.
Regards,
On Tue, Feb 14, 2006 at 11:16:32AM -0800, Guido van Rossum wrote:
Well, just like Java, if you have pure Python code, why should a
developer have to duplicate the busy-work of creating distributions
for different platforms? (Especially since there are so many different
target platforms --
Guido van Rossum [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
In private email, Phillip Eby suggested to add these things to the
2.5. standard library:
bdist_deb, bdist_msi, and friends
He explained them as follows:
bdist_deb makes .deb files (packages for Debian-based
On 2/14/06, Just van Rossum [EMAIL PROTECTED] wrote:
...
Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).
What about shorter names, such as 'text'
On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:
What about shorter names, such as 'text' instead of 'opentext' and
'data' instead of 'openbinary'? By eschewing the 'open' prefix we
might make it easy to eventually migrate off it. Maybe text and data
could be two subclasses of file,
On 2/14/06, Just van Rossum [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
[...] surely text files are more commonly used, and surely the
most common operation should have the shorter name -- call it the
Huffman Principle.
+1 for two functions.
My choice would be open() for binary and
I'm about to send 6 or 8 replies to various salient messages in the
PEP 332 revival thread. That's probably a sign that there's still a
lot to be sorted out. In the mean time, to save you reading through
all those responses, here's a summary of where I believe I stand.
Let's continue the
On 2/14/06, Thomas Wouters [EMAIL PROTECTED] wrote:
On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good
enough:
abc.encode(latin-1)
On 2/14/06, Adam Olsen [EMAIL PROTECTED] wrote:
I'm starting to wonder, do we really need anything fancy? Wouldn't it
be sufficient to have a way to compactly store 8-bit integers?
In 2.x we could convert unicode like this:
bytes(ord(c) for c in uIt'sencode('utf-8'))
Yuck.
On 2/13/06, Barry Warsaw [EMAIL PROTECTED] wrote:
This makes me think I want an unsigned byte type, which b[0] would
return. In another thread I think someone mentioned something about
fixed width integral types, such that you could have an object that
was guaranteed to be 8-bits wide,
On 2/13/06, Adam Olsen [EMAIL PROTECTED] wrote:
What would that imply for repr()? To support eval(repr(x)) it would
have to produce whatever format the source code includes to begin
with.
I'm not sure that's a requirement. (I do think that in 2.x,
str(bytes(s)) == s should hold as long as
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
I didn't mean that it was the only purpose. In Python 2.x, practical code
has to sometimes deal with string-like objects. That is,
On 2/14/06, Barry Warsaw [EMAIL PROTECTED] wrote:
A related question: what would bytes([104, 101, 108, 108, 111, 8004])
return? An exception hopefully.
Absolutely.
I also think you'd want bytes([x
for x in some_bytes_object]) to return an object equal to the original.
You mean if
On Feb 14, 2006, at 2:05 PM, Joe Smith wrote:
Guido van Rossum [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
In private email, Phillip Eby suggested to add these things to the
2.5. standard library:
bdist_deb, bdist_msi, and friends
He explained them as follows:
On Tue, Feb 14, 2006 at 05:48:57PM -0500, Barry Warsaw wrote:
On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:
What about shorter names, such as 'text' instead of 'opentext' and
'data' instead of 'openbinary'? By eschewing the 'open' prefix we
might make it easy to eventually
On Tue, Feb 14, 2006 at 05:05:08PM -0500, Joe Smith wrote:
I don't like the idea of bdist_deb very much.
The idea behind the debian packaging system is that unlike with RPM and
Windows, package management should be clean.
The idea behind RPM is also that package management should be clean.
On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
I'm about to send 6 or 8 replies to various salient messages in the
PEP 332 revival thread. That's probably a sign that there's still a
lot to be sorted out. In the mean time, to save you reading through
all those responses, here's a
On Tue, Feb 14, 2006 at 03:13:37PM -0800, Guido van Rossum wrote:
Also, bytes objects are (in my mind anyway) mutable. We have no other
literal notation for mutable objects. What would the following code
print?
for i in range(2):
b = babc
print b
b[0] = ord(A)
Would the
At 03:14 PM 2/14/2006 -0800, Bob Ippolito wrote:
I'm also not sure what the uninstallation story
with scripts is.
The scripts have enough breadcrumbs in them that you can figure out what
egg they go with. More precisely, an egg contains enough information for
you to search PATH for its scripts
On 2/14/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X. That spelling would provide a way of ensuring the type
of the return value.
At the cost of an extra copying step.
[Guido]
You missed the part where I said
On Tue, 2006-02-14 at 15:13 -0800, Guido van Rossum wrote:
So I'm taking that the specific properties you want to model are the
overflow behavior, right? N-bit unsigned is defined as arithmethic mod
2**N; N-bit signed is a bit more tricky to define but similar. These
never overflow but
On 2/14/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
Maybe you should ask your coworkers. :-) I think gmail is trying to
do something intelligent with the Mail-Followup-To header.
But you're the only person for whom it does that. Do you have a funny
gmail setting?
--
--Guido van Rossum (home
On 2/14/06, Bob Ippolito [EMAIL PROTECTED] wrote:
On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
- we need a new PEP; PEP 332 won't cut it
- no b... literal
- bytes objects are mutable
- bytes objects are composed of ints in range(256)
- you can pass any iterable of ints
On Tue, Feb 14, 2006 at 03:13:25PM -0800, Guido van Rossum wrote:
Martin von Loewis's alternative for the very controversial set is to
disallow an encoding argument and (I believe) also to disallow Unicode
arguments. In 3.0 this would leave us with s.encode(encoding) as the
only way to
Jeremy Hylton wrote:
Perhaps there is some value in finding functions which ought to expect
const char*. For that, occasional checks should be sufficient; I cannot
see a point in having code permanently pass with that option. In
particular not if you are interfacing with C libraries.
I don't
Greg Ewing [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
I don't think you're doing anyone any favours by trying to protect
them from having to know about these things, because they *need* to
know about them if they're not to write algorithms that seem to
work fine on tests but
On 2/14/06, Neil Schemenauer nas at arctrix.com wrote:
People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X.
Guido wrote:
At the cost of an extra copying step.
That sounds like an implementation issue. If it is important
enough to matter, then why not just add
[Guido van Rossum]
Somewhat controversial:
- bytes(abc) == bytes(map(ord, abc))
At first glance, this seems obvious and necessary, so if it's somewhat
controversial, then I'm missing something. What's the issue?
Raymond
___
Python-Dev mailing
Bob Ippolito wrote:
Martin von Loewis's alternative for the very controversial set is to
disallow an encoding argument and (I believe) also to disallow Unicode
arguments. In 3.0 this would leave us with s.encode(encoding) as the
only way to convert a string (which is always unicode) to bytes. The
Thomas Wouters wrote:
Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
writes a regular distutils setup.py, and I can install the latest,
not-quite-in-apt version by doing 'setup.py bdist_deb' and installing the
resulting .deb.
Why not just do 'setup.py install'
On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:
On 2/14/06, Bob Ippolito [EMAIL PROTECTED] wrote:
On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
- we need a new PEP; PEP 332 won't cut it
- no b... literal
- bytes objects are mutable
- bytes objects are composed of ints in
On Wed, Feb 15, 2006 at 01:51:03PM +1300, Greg Ewing wrote:
Thomas Wouters wrote:
Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
writes a regular distutils setup.py, and I can install the latest,
not-quite-in-apt version by doing 'setup.py bdist_deb' and
Joe Smith wrote:
Windows and RPM are known for major dependency problems, letting packages
damage each other, having packages that do not uninstall cleanly (i.e.
packages that leave junk all over the place) and generally messing the sytem
up quite baddly over time, so that the OS is
Alex Martelli wrote:
What about shorter names, such as 'text' instead of 'opentext' and
'data' instead of 'openbinary'?
Because those words are just names for pieces of data,
with nothing to connect them with files or the act of
opening a file.
I think the association of open with file is
Raymond Hettinger wrote:
- bytes(abc) == bytes(map(ord, abc))
At first glance, this seems obvious and necessary, so if it's somewhat
controversial, then I'm missing something. What's the issue?
There is an implicit Latin-1 assumption in that code. Suppose
you do
# -*- coding: koi-8r -*-
On 2/14/06, Jim Jewett [EMAIL PROTECTED] wrote:
On 2/14/06, Neil Schemenauer nas at arctrix.com wrote:
People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X.
Guido wrote:
At the cost of an extra copying step.
That sounds like an implementation issue. If it is
Guido van Rossum wrote:
The only remaining question is what if anything to do with an
encoding argment when the first argument is of type str...)
From what you said earlier about str in 2.x being
interpretable as a unicode string which contains
only ascii, it seems to me that if you say
On Feb 14, 2006, at 5:00 PM, Greg Ewing wrote:
Joe Smith wrote:
Windows and RPM are known for major dependency problems, letting
packages
damage each other, having packages that do not uninstall cleanly
(i.e.
packages that leave junk all over the place) and generally messing
the
On Wed, Feb 15, 2006 at 02:00:21PM +1300, Greg Ewing wrote:
Joe Smith wrote:
Windows and RPM are known for major dependency problems, letting packages
damage each other, having packages that do not uninstall cleanly (i.e.
packages that leave junk all over the place) and generally
Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
What would bytes(abc\xf0, latin-1) *mean*?
I'm saying that XXX would be the same encoding as you specified.
On Feb 14, 2006, at 5:22 PM, Trent Mick wrote:
[Greg Ewing wrote]
MacOSX seems to be the only system so far that has got
this right -- organising the system so that everything
related to a given application or library can be kept
under a single directory, clearly labelled with a
version
Trent Mick wrote:
ActivePython and MacPython have to install stuff to:
/usr/local/bin/...
/Library/Frameworks/Python.framework/...
/Applications/MacPython-2.4/... # just MacPython does this
It's not perfect, but it's still a lot better than the
situation on any other unix I've
Thomas Wouters wrote:
Well, as an end user, I honestly don't care.
As a programmer, I also don't care.
Perhaps I've been burned once too often by someone's
oh-so-clever installer script screwing up and leaving
me to wade through an impenetrable pile of makefiles,
shell scripts and m4 macros
Thomas Wouters wrote:
The encoding of network streams or files may be
entirely unknown beforehand, and depend on the content: a content-encoding,
a META EQUIV HTML tag. Will bytes-strings get string methods for easy
searching of content descriptors?
Seems to me this is a case where you want
Guido van Rossum wrote:
I'm about to send 6 or 8 replies to various salient messages in the
PEP 332 revival thread. That's probably a sign that there's still a
lot to be sorted out. In the mean time, to save you reading through
all those responses, here's a summary of where I believe I stand.
Greg Ewing wrote:
Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
What would bytes(abc\xf0, latin-1) *mean*?
I'm saying that XXX would be the same encoding
After some revisions, PEP 357 is ready for more comments. Please voice
any concerns.
-Travis
PEP: 357
Title: Allowing Any Object to be Used for Slicing
Version: $Revision: 42367 $
Last Modified: $Date: 2006-02-14 18:12:07 -0700 (Tue, 14 Feb 2006) $
Author: Travis Oliphant [EMAIL PROTECTED]
Attached is the 2.5 release PEP 356. It's also available from:
http://www.python.org/peps/pep-0356.html
Does anyone have any comments? Is this good or bad? Feel free to
send to me comments.
We need to ensure that PEPs 308, 328, and 343 are implemented. We
have possible volunteers for 308
Ron Adam wrote:
My first impression and thoughts were: (and seems incorrect now)
bytes(object) - byte sequence of objects value
Basically a memory dump of objects value.
As I understand the current intentions, this is correct.
The bytes constructor would have two different
Fred L. Drake, Jr. wrote:
The proper response in this case is often to re-start decoding
with the correct encoding, since some of the data extracted so far may have
been decoded incorrectly.
If the protocol has been sensibly designed, that shouldn't
happen, since everything up to the coding
On Wednesday 15 February 2006 01:44, Greg Ewing wrote:
If the protocol has been sensibly designed, that shouldn't
happen, since everything up to the coding marker should
be ascii (or some other protocol-defined initial coding).
Indeed.
For protocols that are not sensibly designed (or if
80 matches
Mail list logo