Re: [Python-Dev] Python 3.x and bytes

2011-06-14 Thread P.J. Eby

At 01:56 AM 6/14/2011 +, exar...@twistedmatrix.com wrote:

On 12:35 am, ncogh...@gmail.com wrote:

On Tue, Jun 14, 2011 at 9:40 AM, P.J. Eby p...@telecommunity.com wrote:

You can still do it one at a time:

CHAR, = b'C'
INT,  = b'I'
...

etc.  I just tried it with Python 3.1 and it works there.


I almost mentioned that, although it does violate one of the
unwritten rules of the Zen (in this case, syntax shall not look
like grit on Tim's monitor)


   [CHAR] = b'C'
   [INT]  = b'I'
   ...


Holy carpal tunnel time machine...  That works in 2.3.  (Without the 
'b' of course.)  Didn't know you could just use list syntax like 
that.  It's an extra character to type, and two more shift keyings, 
but brevity isn't always the soul of clarity.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-14 Thread Ethan Furman

P.J. Eby wrote:

At 01:56 AM 6/14/2011 +, exar...@twistedmatrix.com wrote:

On 12:35 am, ncogh...@gmail.com wrote:

On Tue, Jun 14, 2011 at 9:40 AM, P.J. Eby p...@telecommunity.com wrote:

You can still do it one at a time:

CHAR, = b'C'
INT,  = b'I'
...

etc.  I just tried it with Python 3.1 and it works there.


I almost mentioned that, although it does violate one of the
unwritten rules of the Zen (in this case, syntax shall not look
like grit on Tim's monitor)


   [CHAR] = b'C'
   [INT]  = b'I'
   ...


Holy carpal tunnel time machine...  That works in 2.3.  (Without the 'b' 
of course.)  Didn't know you could just use list syntax like that.  It's 
an extra character to type, and two more shift keyings, but brevity 
isn't always the soul of clarity.


I'm thinking I like to the 'new' tuple-assignment character... ,= !

CHAR,= b'C'
DATE,= b'D'
LOGICAL ,= b'L'

;)

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-14 Thread Łukasz Langa
Wiadomość napisana przez Ethan Furman w dniu 2011-06-14, o godz. 19:46:

   [CHAR] = b'C'
   [INT]  = b'I'


 CHAR,= b'C'
 DATE,= b'D'
 LOGICAL ,= b'L'


Perl Jam!

-- 
Best regards,
Łukasz Langa
tel. +48 791 080 144
WWW http://lukasz.langa.pl/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-13 Thread Ethan Furman
Thank you all for the responses.  Rather than reply to each, I just made 
one big summary.  :)



Martin v. Löwis wrote:
 Ethan Furman wrote:
 # constants

 EOH  = b'\r'[0]
 CHAR = b'C'[0]
 DATE = b'D'[0]
 FLOAT = b'F'[0]
 INT = b'I'[0]
 LOGICAL = b'L'[0]
 MEMO = b'M'[0]
 NUMBER = b'N'[0]

 This is not beautiful code.

 In this case, I think the intent would be better captured with

 def ASCII(c):
 return c.encode('ascii')

 EOH = ASCII('\r') # 0D
 CHAR= ASCII('C')  # 43
 DATE= ASCII('D')  # 44
 FLOAT   = ASCII('F')  # 46
 INT = ASCII('I')  # 49
 LOGICAL = ASCII('L')  # 4C
 MEMO= ASCII('M')  # 4D
 NUMBER  = ASCII('N')  # 4E

 This expresses the intent that a) these are really byte values,
 not characters, and b) the specific choice of byte values was
 motivated by ASCII.

Definitely easier to read.  If I go this route I'll probably use ord(), 
though, since ascii and unicode are the same for the first 127 chars, 
and there will be plenty of places to error out with a more appropriate 
message if I get garbage.  Since I really don't care what the actual 
integer values are, I'll skip those comments, too.



Hagen Fürstenau wrote:
 You still have the alternative

 EOH = ord('\r')
 CHAR = ord('C')
 ...

 which looks fine to me.

Yes it does.  I just dislike the (to me unnecessary) extra function 
call.  For those tuning in late to this thread, these are workarounds 
for this not working:


field_type = header[11] # field_type is now an int, not a 1-byte bstr
if field_type == r'C':  # r'C' is a 1-byte bstr, so this always fails


Greg Ewing wrote:
 Guido van Rossum wrote:
 On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan wrote:
 Proposals to address this include:
 - introduce a character literal to allow c'a' as an alternative
 to ord('a')

 -1; the result is not a *character* but an integer.

 Would you be happier if it were spelled i'a' instead?

That would work for me, although I would prefer a'a' (for ASCII).  :)


Stephen J. Turnbull wrote:
 Put mascara on a pig, and you have a pig with mascara on, not Bette
 Davis.  I don't necessarily think you're doing anybody a service by
 making the hack of using ASCII bytes as mnemonics more beautiful.  I
 think Martin's version is as beautiful as this code should get.

I'll either use Martin's or Nick's.  The point of beauty here is the 
ease of readability.  I think less readable is worse, and we shouldn't 
have to have ugly, hard to read code nor inefficient code just because 
we have to deal with byte streams that aren't unicode.



Nick Coghlan wrote:
 Agreed, but:

 EOH, CHAR, DATE, FLOAT, INT, LOGICAL, MEMO, NUMBER = b'\rCDFILMN'

 is a shorter way to write the same thing.

 Going two per line makes it easier to mentally map the characters:

 EOH, CHAR = b'\rC'
 DATE, FLOAT = b'DF'
 INT, LOGICAL = b'IL'
 MEMO, NUMBER = b'MN'

Wow.  I didn't realize that could be done.  That very nearly makes up 
for not being able to do it one char at a time.


Thanks, Nick!


~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-13 Thread P.J. Eby

At 03:11 PM 6/13/2011 -0700, Ethan Furman wrote:

Nick Coghlan wrote:
 Agreed, but:

 EOH, CHAR, DATE, FLOAT, INT, LOGICAL, MEMO, NUMBER = b'\rCDFILMN'

 is a shorter way to write the same thing.

 Going two per line makes it easier to mentally map the characters:

 EOH, CHAR = b'\rC'
 DATE, FLOAT = b'DF'
 INT, LOGICAL = b'IL'
 MEMO, NUMBER = b'MN'

Wow.  I didn't realize that could be done.  That very nearly makes 
up for not being able to do it one char at a time.


You can still do it one at a time:

CHAR, = b'C'
INT,  = b'I'
...

etc.  I just tried it with Python 3.1 and it works there.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-13 Thread Nick Coghlan
On Tue, Jun 14, 2011 at 9:40 AM, P.J. Eby p...@telecommunity.com wrote:
 You can still do it one at a time:

 CHAR, = b'C'
 INT,  = b'I'
 ...

 etc.  I just tried it with Python 3.1 and it works there.

I almost mentioned that, although it does violate one of the
unwritten rules of the Zen (in this case, syntax shall not look
like grit on Tim's monitor)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-13 Thread exarkun

On 12:35 am, ncogh...@gmail.com wrote:
On Tue, Jun 14, 2011 at 9:40 AM, P.J. Eby p...@telecommunity.com 
wrote:

You can still do it one at a time:

CHAR, = b'C'
INT, �= b'I'
...

etc. �I just tried it with Python 3.1 and it works there.


I almost mentioned that, although it does violate one of the
unwritten rules of the Zen (in this case, syntax shall not look
like grit on Tim's monitor)


   [CHAR] = b'C'
   [INT]  = b'I'
   ...

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Ethan Furman

Guido van Rossum wrote:

On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan wrote:

Proposals to address this include:
- introduce a character literal to allow c'a' as an alternative to ord('a')


-1; the result is not a *character* but an integer. I'm personally
favoring using b'a'[0] and possibly hiding this in a constant
definition.


Using this method, my code now looks like:

# constants

EOH  = b'\r'[0]
CHAR = b'C'[0]
DATE = b'D'[0]
FLOAT = b'F'[0]
INT = b'I'[0]
LOGICAL = b'L'[0]
MEMO = b'M'[0]
NUMBER = b'N'[0]

This is not beautiful code.

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Martin v. Löwis
 # constants
 
 EOH  = b'\r'[0]
 CHAR = b'C'[0]
 DATE = b'D'[0]
 FLOAT = b'F'[0]
 INT = b'I'[0]
 LOGICAL = b'L'[0]
 MEMO = b'M'[0]
 NUMBER = b'N'[0]
 
 This is not beautiful code.

In this case, I think the intent would be better captured with

def ASCII(c):
return c.encode('ascii')

EOH = ASCII('\r') # 0D
CHAR= ASCII('C')  # 43
DATE= ASCII('D')  # 44
FLOAT   = ASCII('F')  # 46
INT = ASCII('I')  # 49
LOGICAL = ASCII('L')  # 4C
MEMO= ASCII('M')  # 4D
NUMBER  = ASCII('N')  # 4E

This expresses the intent that a) these are really byte values,
not characters, and b) the specific choice of byte values was motivated
by ASCII.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Hagen Fürstenau
 EOH  = b'\r'[0]
 CHAR = b'C'[0]
 DATE = b'D'[0]
 FLOAT = b'F'[0]
 INT = b'I'[0]
 LOGICAL = b'L'[0]
 MEMO = b'M'[0]
 NUMBER = b'N'[0]
 
 This is not beautiful code.

You still have the alternative

EOH = ord('\r')
CHAR = ord('C')
...

which looks fine to me.

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Greg Ewing

Guido van Rossum wrote:


On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan wrote:


Proposals to address this include:
- introduce a character literal to allow c'a' as an alternative to 
ord('a')


-1; the result is not a *character* but an integer.


Would you be happier if it were spelled i'a' instead?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Stephen J. Turnbull
Ethan Furman writes:

  Using this method, my code now looks like:
  
  # constants

[...]

  This is not beautiful code.

Put mascara on a pig, and you have a pig with mascara on, not Bette
Davis.  I don't necessarily think you're doing anybody a service by
making the hack of using ASCII bytes as mnemonics more beautiful.  I
think Martin's version is as beautiful as this code should get.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Nick Coghlan
On Mon, Jun 13, 2011 at 3:18 AM, Ethan Furman et...@stoneleaf.us wrote:

 This is not beautiful code.

Agreed, but:

EOH, CHAR, DATE, FLOAT, INT, LOGICAL, MEMO, NUMBER = b'\rCDFILMN'

is a shorter way to write the same thing.

Going two per line makes it easier to mentally map the characters:

EOH, CHAR = b'\rC'
DATE, FLOAT = b'DF'
INT, LOGICAL = b'IL'
MEMO, NUMBER = b'MN'

Or, as a variant on Martin's solution:

FORMAT_CHARS = dict(
  EOH = '\r',
  CHAR= 'C',
  DATE = 'D',
  FLOAT = 'F',
  INT = 'I',
  LOGICAL = 'L',
  MEMO = 'M',
  NUMBER = 'N'
)

FORMAT_CODES = {name : char.encode('ascii') for name, char in FORMAT_CHARS}
globals().update(FORMAT_CODES)

Sure, there's no one obvious way at this stage, but that's because
we don't know yet if there even *should* be an obvious way to do this
(as conflating text and binary data is a bad idea in principle). By
not blessing any one way of handling the situation, we give
alternative solutions time to evolve naturally. If one turns out to be
clearly superior to the decode/process/encode cycle then hopefully
that will become clear at some point in the future.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-23 Thread Ethan Furman

Glyph Lefkowitz wrote:
In fact, I feel like I would want to push in the opposite direction: 
don't treat one-byte bytes slices less like integers; I wish I could 
more easily treat n-byte sequences _more_ like integers! :).  More 
protocols have 2-byte or 4-byte network-endian packed integers embedded 
in them than have individual tag bytes that I want to examine.


So are you thinking that bytes([01,56])[:2] == 120 ?  Or more along the 
lines of a .to_int() method?


~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-23 Thread Terry Reedy

On 5/23/2011 1:20 PM, Ethan Furman wrote:

Glyph Lefkowitz wrote:

In fact, I feel like I would want to push in the opposite direction:
don't treat one-byte bytes slices less like integers; I wish I could
more easily treat n-byte sequences _more_ like integers! :). More
protocols have 2-byte or 4-byte network-endian packed integers
embedded in them than have individual tag bytes that I want to examine.


So are you thinking that bytes([01,56])[:2] == 120 ? Or more along the
lines of a .to_int() method?


I believe that such things can be handled by the struct module.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-20 Thread Nick Coghlan
On Fri, May 20, 2011 at 10:40 AM, Ethan Furman et...@stoneleaf.us wrote:
 This behavior matches what I was imagining for having
 b'a' == 97.  They compare equal, yet remain distinct objects
 for all other purposes.

 If anybody has a link to or an explanation why equal values must be equal
 hashes I'm all ears.  My apologies in advance if this is an incredibly naive
 question.

Because whether or not two objects can coexist in the same hash table
should *not* depend on their hash values - it should depend on whether
or not they compare equal to each other. The use of hashing should
just be an optimisation, not fundamentally change the nature of the
comparison operation. (i.e. hash(a) == hash(b) and a == b is meant
to be a fast alternative to a == b, not a completely different
check).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Stefan Behnel

Greg Ewing, 19.05.2011 00:02:

Georg Brandl wrote:


We do have

bytes.fromhex('deadbeef')


But again, there is a run-time overhead to this.


Well, yes, but it's negligible if you assign it to a suitable variable first.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Xavier Morel
On 2011-05-19, at 07:28 , Georg Brandl wrote:
 On 19.05.2011 00:39, Greg Ewing wrote:
 Ethan Furman wrote:
 
 some_var[3] == b'd'
 
 1) a check to see if the bytes instance is length 1
 2) a check to see if
   i) the other object is an int, and
   2) 0 = other_obj  256
 3) if 1 and 2, make the comparison instead of returning NotImplemented?
 
 It might seem convenient, but I'd worry that it would lead to
 even more confusion in other ways. If someone sees that
 
some_var[3] == b'd'
 
 is true, and that
 
some_var[3] == 100
 
 is also true, they might expect to be able to do things
 like
 
n = b'd' + 1
 
 and get 101... or maybe b'e'...
 
 Maybe they should :)

But why wouldn't they expect `b'de' + 1` to work as well in this case? If a 
1-byte bytes is equivalent to an integer, why not an arbitrary one as well?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Nick Coghlan
On Thu, May 19, 2011 at 5:10 AM, Eric Smith e...@trueblade.com wrote:
 On 05/18/2011 12:16 PM, Stephen J. Turnbull wrote:
 Robert Collins writes:

   Its probably too late to change, but please don't try to argue that
   its correct: the continued confusion of folk running into this is
   evidence that confusion *is happening*. Treat that as evidence and
   think about how to fix it going forward.

 Sorry, Rob, but you're just wrong here, and Nick is right.  It's
 possible to improve Python 3, but not to fix it in this respect.
 The Python 3 solution is correct, the Python 2 approach is not.
 There's no way to avoid discontinuity and confusion here.

 I don't think there's any connection between the way 2.x confused text
 strings and binary data (which certainly needed addressing) with the way
 that 3.x returns a different type for byte_str[i] than it does for
 byte_str[i:i+1]. I think it's the latter that's confusing to people.
 There's no particular requirement for different types that's needed to
 fix the byte/str problem.

It's a mental model problem. People try to think of bytes as
equivalent to 2.x str and that's just wrong, wrong, wrong. It's far
closer to array.array('c'). Strings are basically *unique* in
returning a length 1 instance of themselves for indexing operations.
For every other sequence type, including tuples, lists and arrays,
slicing returns a new instance of the same type, while indexing will
typically return something different.

Now, we definitely didn't *help* matters by keeping so many of the
default behaviours of bytes() and bytearray() coupled to ASCII-encoded
text, but that was a matter of practicality beating purity: there
really *are* a lot of wire protocols out there that are ASCII based.
In hindsight, perhaps we should have gone further in breaking things
to try to make the point about the mental model shift more forcefully.
(However, that idea carries with it its own problems).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Stephen J. Turnbull
Robert Collins writes:

  Thats separate to the implementation issues I have mentioned in this
  thread and previous.

Oops, sorry.

Nevertheless, I personally think that b'a'[0] == 97 is a good idea,
and consistent with everything else in Python.  It's Unicode (str)
that is weird, it's str is surprising when first encountered by a C or
Lisp programmer at first, but not enough to cause a heart attack given
how weird natural language is.  But I don't see why that weirdness (an
element of LIST of TYPE is a LIST of TYPE, hey, young man, you're very
smart but *it's turtles all the way down!*) should be replicated
elsewhere.

If you want your bytes object to behave like a str, it's very easy to
get that (.decode('latin1')), and nobody has yet demonstrated that
this is too time-inefficient for real work, given the other overhead
imposed by Python.  The space inefficiency could be dealt with as Greg
points out (by internally having a Unicode representation using 1 byte
instead of 2 or 4).  But if you want your bytes object to *be* a
string, then you're confused.  It isn't (any more).  Even if it's just
a matter of flipping one bit in the type field, a str-with-unibyte-
representation, is not equal to a bytes object with the same bytes.

For example, you write:

  urlparse converting bytes to 'str' to operate on them is at best a
  kludge - you're forcing 5 times the storage (the original bytes + 4
  bytes-per-byte when its decoded into unicode) to work on something
  which is defined as a BNF * that uses ascii *.

Indeed it (RFC 3896) does *use* ASCII.  But I think there is confusion
in your words.  This is what the RFC says about that use of ASCII:

   2.  Characters

   The URI syntax provides a method of encoding data, presumably for the
   sake of identifying a resource, as a sequence of characters.  [...]

   The ABNF notation defines its terminal values to be non-negative
   integers (codepoints) based on the US-ASCII coded character set
   [ASCII].  Because a URI is a sequence of characters, we must invert
   that relation in order to understand the URI syntax.  Therefore, the
   integer values used by the ABNF must be mapped back to their
   corresponding characters via US-ASCII in order to complete the syntax
   rules.

Ie, ASCII is *irrelevant* to (the modern definition of) URLs except as
it is a convenient and familiar way to refer to a certain familiar and
rather small set of *characters*.  There are reasons for this (that
I'm not going to rehash here), and they are the *same* reasons why
Python 3's behavior is correct IMHO (modulo the issue about the type
of a list element, which I discuss above).

It is true that one might like there to be a literal that expresses
`ord(bytes-object-of-length-one)', ie, something like o'a' == 97.
(This is different from Greg's x'6465616462656566' == b'deadbeef',
which I don't think helps solve the confusion problem although it
would definitely be convenient.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Xavier Morel
On 2011-05-19, at 09:49 , Nick Coghlan wrote:
 On Thu, May 19, 2011 at 5:10 AM, Eric Smith e...@trueblade.com wrote:
 On 05/18/2011 12:16 PM, Stephen J. Turnbull wrote:
 Robert Collins writes:
 
   Its probably too late to change, but please don't try to argue that
   its correct: the continued confusion of folk running into this is
   evidence that confusion *is happening*. Treat that as evidence and
   think about how to fix it going forward.
 
 Sorry, Rob, but you're just wrong here, and Nick is right.  It's
 possible to improve Python 3, but not to fix it in this respect.
 The Python 3 solution is correct, the Python 2 approach is not.
 There's no way to avoid discontinuity and confusion here.
 
 I don't think there's any connection between the way 2.x confused text
 strings and binary data (which certainly needed addressing) with the way
 that 3.x returns a different type for byte_str[i] than it does for
 byte_str[i:i+1]. I think it's the latter that's confusing to people.
 There's no particular requirement for different types that's needed to
 fix the byte/str problem.
 
 It's a mental model problem. People try to think of bytes as
 equivalent to 2.x str and that's just wrong, wrong, wrong. It's far
 closer to array.array('c'). Strings are basically *unique* in
 returning a length 1 instance of themselves for indexing operations.
 For every other sequence type, including tuples, lists and arrays,
 slicing returns a new instance of the same type, while indexing will
 typically return something different.
 
 Now, we definitely didn't *help* matters by keeping so many of the
 default behaviours of bytes() and bytearray() coupled to ASCII-encoded
 text, but that was a matter of practicality beating purity: there
 really *are* a lot of wire protocols out there that are ASCII based.
 In hindsight, perhaps we should have gone further in breaking things
 to try to make the point about the mental model shift more forcefully.
 (However, that idea carries with it its own problems).

For what it's worth, Erlang's approach to the subject is — in my
opinion — excellent:
binaries (whose literals are called bit syntax there) are quite
distinct from strings in both syntax and API, but you can put
chunks of strings within binaries (the bit syntax acts as a container,
in which you can put a literal or non-literal string). This
simultaneously impresses upon the user that binaries are *not* strings
and that they can still easily create binaries from strings.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Stefan Behnel

Xavier Morel, 19.05.2011 09:41:

On 2011-05-19, at 07:28 , Georg Brandl wrote:

On 19.05.2011 00:39, Greg Ewing wrote:

If someone sees that

some_var[3] == b'd'

is true, and that

some_var[3] == 100

is also true, they might expect to be able to do things
like

n = b'd' + 1

and get 101... or maybe b'e'...


Maybe they should :)


But why wouldn't they expect `b'de' + 1` to work as well in this case? If a 
1-byte bytes is equivalent to an integer, why not an arbitrary one as well?


The result of this must obviously be bde1.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Nick Coghlan
OK, summarising the thread so far from my point of view.

1. There are some aspects of the behavior of bytes() objects that
tempt people to think of them as string-like objects (primarily the
b'' literals and their use in repr(), along with the fact that they
fill roles that were filled by str in it's arbitrary binary data
incarnation in Python 2.x). The mental model this creates in the
reader is incorrect, as bytes() are far closer to array.array('c') in
their underlying behaviour (and deliberately so - cf. PEP 358, 3112,
3137).

One proposal for addressing this is to add a x'deadbeef' literal and
using that in repr() rather than the bytestring. Another would be to
escape all characters, even printable ASCII, in the bytes()
representation. Both of these are undesirable, as they miss the
original purpose of this behaviour: making it easier to work with the
many ASCII based wire protocols that are in widespread use.

To be honest, I don't think there is a lot we can do here except to
further emphasise in the documentation and elsewhere that *bytes is
not a string type* (regardless of any API similarities retained to
ease transition from the 2.x series). For example, if we have any
lingering references to byte strings they should be replaced with
byte sequences or bytes objects (depending on context, as the
former phrasing also encompasses bytearray objects).

2. As a concrete usability issue, it is awkward to programmatically
check the value of a specific byte when working with an ASCII based
protocol:

  data[i] == b'a' # Intuitive, but always False due to type mismatch
  data[i:i+1] == b'a'  # Works, but clumsy
  data[i] == b'a'[0]  # Ditto (but at least susceptible to compiler
const-expression optimisation)
  data[i] == ord('a') # Clumsy and slow
  data[i] == 97 # Hard to read

Proposals to address this include:
- introduce a character literal to allow c'a' as an alternative to ord('a')
Potentially workable, but leaves the intuitive answer above
silently producing an unexpected answer
- allow 1-element byte sequences to compare equal to the corresponding
integer values.
- would require reworking of bytes.__hash__ to use the hash of the
contained element when the data length is exactly 1
- transitivity of equality would recommend also supporting
equivalences such as b'a' == 97.0
- backwards compatibility concerns arise due to introduction of
new key collisions in dictionaries and sets and other value based
containers
- yet more string-like behaviour in a type that is *not* a string
(further reinforcing the mistaken impression from point 1)
- One thing that *isn't* a concern from my point of view is the
fact that we have ample precedent in decimal.Decimal for supporting
implicit coercion in comparison operations while disallowing them in
arithmetic operations (Decimal(1) == 1.0 is allowed, but
Decimal(1) + 1.0 will raise TypeError).

For point 2, I'm personally +0 on the idea of having 1-element bytes
and bytearray objects delegate hashing and comparison operations to
the corresponding integer object. We have the power to make the
obvious code correct code, so let's do that. However, the implications
of the additional key collisions in value based containers may need to
be explored further.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Łukasz Langa
Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37:

 But why wouldn't they expect `b'de' + 1` to work as well in this case? If 
 a 1-byte bytes is equivalent to an integer, why not an arbitrary one as well?
 
 The result of this must obviously be bde1.

I hope you're joking. At best, the result should be bde\x01. But I don't 
think such construct should be allowed. Just like you can't do `[1, 2, 3] + 4`. 
I wouldn't ever expect that a single byte behaves like a sequence of bytes. In 
the case of bytes b'a' is obviously still a sequence of bytes, just happening 
to store a single one. Indexing should return a byte so I'm not surprised it 
returns a number. Slicing on the other hand returns a sub-sequence.

However inconvenient, I find the current behaviour logical and predictable. A 
shortcut for b'a'[0] would obviously be nice but that's for python-ideas.

-- 
Best regards,
Łukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Stefan Behnel

Łukasz Langa, 19.05.2011 11:25:

Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37:


But why wouldn't they expect `b'de' + 1` to work as well in this case? If a 
1-byte bytes is equivalent to an integer, why not an arbitrary one as well?


The result of this must obviously be bde1.


I hope you're joking.


I obviously was. My point is that expectations and obvious behaviour 
may not be obvious to everyone.


Nick summed it up very nicely IMHO.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Xavier Morel
On 2011-05-19, at 11:25 , Łukasz Langa wrote:
 Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37:
 
 But why wouldn't they expect `b'de' + 1` to work as well in this case? If 
 a 1-byte bytes is equivalent to an integer, why not an arbitrary one as 
 well?
 
 The result of this must obviously be bde1.
 I hope you're joking. At best, the result should be bde\x01.

Actually, if `b'd'+1` returns `b'e'` an equivalent behavior should be that 
`b'de'+1` returns `b'df'`.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Antoine Pitrou
On Thu, 19 May 2011 17:49:47 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
 It's a mental model problem. People try to think of bytes as
 equivalent to 2.x str and that's just wrong, wrong, wrong. It's far
 closer to array.array('c'). Strings are basically *unique* in
 returning a length 1 instance of themselves for indexing operations.
 For every other sequence type, including tuples, lists and arrays,
 slicing returns a new instance of the same type, while indexing will
 typically return something different.
 
 Now, we definitely didn't *help* matters by keeping so many of the
 default behaviours of bytes() and bytearray() coupled to ASCII-encoded
 text, but that was a matter of practicality beating purity: there
 really *are* a lot of wire protocols out there that are ASCII based.

I think practicality beating purity should have been extended to
__getitem__ as well. I have almost never had a use for treating a
bytestring as a sequence of integers, while treating a bytestring as a
sequence of one-byte strings is *very* common.

(and, as you say, if you want a sequence of integers you can already
use array.array() which gives you more flexibility as to the width and
signedness of integers)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Nick Coghlan
On Thu, May 19, 2011 at 6:43 PM, Nick Coghlan ncogh...@gmail.com wrote:
 For point 2, I'm personally +0 on the idea of having 1-element bytes
 and bytearray objects delegate hashing and comparison operations to
 the corresponding integer object. We have the power to make the
 obvious code correct code, so let's do that. However, the implications
 of the additional key collisions in value based containers may need to
 be explored further.

On further reflection, the key collision and semantics blurring
problems mean I am at best -0 on this particular solution to the
problem (and heading fairly rapidly in the direction of -1).

Best to just go with b'a'[0] and let the optimiser sort it out (PyPy
should handle it automatically, CPython would need work).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Michael Foord

On 19/05/2011 10:25, Łukasz Langa wrote:

Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37:


But why wouldn't they expect `b'de' + 1` to work as well in this case? If a 
1-byte bytes is equivalent to an integer, why not an arbitrary one as well?

The result of this must obviously be bde1.

I hope you're joking. At best, the result should be bde\x01.
The behaviour Stefan suggests is what some weakly typed languages like 
perl (and possibly php?) do, which masks errors and is rightly abhorred 
by Python programmers (although semantically not *so* different from 1 + 
1.0 == 2.0). I think it's safe to say that Stefan was joking.


Michael


  But I don't think such construct should be allowed. Just like you can't do 
`[1, 2, 3] + 4`. I wouldn't ever expect that a single byte behaves like a 
sequence of bytes. In the case of bytes b'a' is obviously still a sequence of 
bytes, just happening to store a single one. Indexing should return a byte so 
I'm not surprised it returns a number. Slicing on the other hand returns a 
sub-sequence.

However inconvenient, I find the current behaviour logical and predictable. A 
shortcut for b'a'[0] would obviously be nice but that's for python-ideas.




--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Ethan Furman

Nick Coghlan wrote:
OK, summarising the thread so far from my point of view. 


[snip]


To be honest, I don't think there is a lot we can do here except to
further emphasise in the documentation and elsewhere that *bytes is
not a string type* (regardless of any API similarities retained to
ease transition from the 2.x series). For example, if we have any
lingering references to byte strings they should be replaced with
byte sequences or bytes objects (depending on context, as the
former phrasing also encompasses bytearray objects).


I think this would be a big help.


2. As a concrete usability issue, it is awkward to programmatically
check the value of a specific byte when working with an ASCII based
protocol:

  data[i] == b'a' # Intuitive, but always False due to type mismatch
  data[i:i+1] == b'a'  # Works, but clumsy
  data[i] == b'a'[0]  # Ditto (but at least susceptible to compiler
const-expression optimisation)
  data[i] == ord('a') # Clumsy and slow
  data[i] == 97 # Hard to read

Proposals to address this include:
- introduce a character literal to allow c'a' as an alternative to ord('a')
Potentially workable, but leaves the intuitive answer above
silently producing an unexpected answer


[snip]


For point 2, I'm personally +0 on the idea of having 1-element bytes
and bytearray objects delegate hashing and comparison operations to
the corresponding integer object. We have the power to make the
obvious code correct code, so let's do that. However, the implications
of the additional key collisions in value based containers may need to
be explored further.


Nick Coghlan also wrote:
 On further reflection, the key collision and semantics blurring
 problems mean I am at best -0 on this particular solution to the
 problem (and heading fairly rapidly in the direction of -1).

Last thought I have for a possible 'solution' -- when a bytes object is 
tested for equality against an int raise TypeError.  Precedent being 
sum() raising a TypeError when passed a list of strings because 
performance is so poor.  Reason here being that the intuitive behavior 
will never work and will always produce silent bugs.


~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Guido van Rossum
On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan ncogh...@gmail.com wrote:
 OK, summarising the thread so far from my point of view.

 1. There are some aspects of the behavior of bytes() objects that
 tempt people to think of them as string-like objects (primarily the
 b'' literals and their use in repr(), along with the fact that they
 fill roles that were filled by str in it's arbitrary binary data
 incarnation in Python 2.x). The mental model this creates in the
 reader is incorrect, as bytes() are far closer to array.array('c') in
 their underlying behaviour (and deliberately so - cf. PEP 358, 3112,
 3137).

I think most of this wrong mental model is actually due to people
not having completely internalized the Python 3 way.

 One proposal for addressing this is to add a x'deadbeef' literal and
 using that in repr() rather than the bytestring. Another would be to
 escape all characters, even printable ASCII, in the bytes()
 representation. Both of these are undesirable, as they miss the
 original purpose of this behaviour: making it easier to work with the
 many ASCII based wire protocols that are in widespread use.

Indeed, -1 on both.

 To be honest, I don't think there is a lot we can do here except to
 further emphasise in the documentation and elsewhere that *bytes is
 not a string type* (regardless of any API similarities retained to
 ease transition from the 2.x series). For example, if we have any
 lingering references to byte strings they should be replaced with
 byte sequences or bytes objects (depending on context, as the
 former phrasing also encompasses bytearray objects).

+1

 2. As a concrete usability issue, it is awkward to programmatically
 check the value of a specific byte when working with an ASCII based
 protocol:

  data[i] == b'a' # Intuitive, but always False due to type mismatch
  data[i:i+1] == b'a'  # Works, but clumsy
  data[i] == b'a'[0]  # Ditto (but at least susceptible to compiler
 const-expression optimisation)
  data[i] == ord('a') # Clumsy and slow
  data[i] == 97 # Hard to read

 Proposals to address this include:
 - introduce a character literal to allow c'a' as an alternative to ord('a')

-1; the result is not a *character* but an integer. I'm personally
favoring using b'a'[0] and possibly hiding this in a constant
definition.

Potentially workable, but leaves the intuitive answer above
 silently producing an unexpected answer

I'm not convinced that that problem is any worse than other
comparison-related problems. E.g. b'a' == 'a' also always returns
False (most likely it'll be disguised by at least one operand being a
variable of course.)

 - allow 1-element byte sequences to compare equal to the corresponding
 integer values.
    - would require reworking of bytes.__hash__ to use the hash of the
 contained element when the data length is exactly 1
    - transitivity of equality would recommend also supporting
 equivalences such as b'a' == 97.0
    - backwards compatibility concerns arise due to introduction of
 new key collisions in dictionaries and sets and other value based
 containers
    - yet more string-like behaviour in a type that is *not* a string
 (further reinforcing the mistaken impression from point 1)
    - One thing that *isn't* a concern from my point of view is the
 fact that we have ample precedent in decimal.Decimal for supporting
 implicit coercion in comparison operations while disallowing them in
 arithmetic operations (Decimal(1) == 1.0 is allowed, but
 Decimal(1) + 1.0 will raise TypeError).

 For point 2, I'm personally +0 on the idea of having 1-element bytes
 and bytearray objects delegate hashing and comparison operations to
 the corresponding integer object. We have the power to make the
 obvious code correct code, so let's do that. However, the implications
 of the additional key collisions in value based containers may need to
 be explored further.

My gut feeling about this is that this will probably introduce some
confusing or unintended side effect elsewhere, and I am -1 on this
change.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Guido van Rossum
On Thu, May 19, 2011 at 10:50 AM, Ethan Furman et...@stoneleaf.us wrote:
 Last thought I have for a possible 'solution' -- when a bytes object is
 tested for equality against an int raise TypeError.  Precedent being sum()
 raising a TypeError when passed a list of strings because performance is so
 poor.  Reason here being that the intuitive behavior will never work and
 will always produce silent bugs.

Not the same thing at all. The == operator is special, and should not
raise exceptions; too many things would start randomly failing (e.g.
membership tests for a dict that has both ints and bytes as keys, or
for a list containing a variety of types).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Glyph Lefkowitz

On May 19, 2011, at 1:43 PM, Guido van Rossum wrote:

 -1; the result is not a *character* but an integer.

Well, really the result ought to be an octet, but I suppose adding an 'octet' 
type is beyond the scope of even this sprawling discussion :).

 I'm personally favoring using b'a'[0] and possibly hiding this in a constant 
 definition.

As someone who spends a frankly unfortunate amount of time handling protocols 
where things like this are necessary, I agree with this recommendation.  In 
protocols where one needs to compare network data with one-byte type 
identifiers or packet prefixes, more (documented) constants and less 
inscrutable junk like

if p == 'c':
   ...
elif p == 'j':
   ...
elif p == 'J': # for compatibility
   ...

would definitely be a good thing.  Of course, I realize that this sort of 
programmer will most likely replace those constants with 99, 106, 74 than take 
a moment to document what they mean, but at least they'll have to pause for a 
moment and realize that they have now lost _all_ mnemonics...

In fact, I feel like I would want to push in the opposite direction: don't 
treat one-byte bytes slices less like integers; I wish I could more easily 
treat n-byte sequences _more_ like integers! :).  More protocols have 2-byte or 
4-byte network-endian packed integers embedded in them than have individual tag 
bytes that I want to examine.  For the typical ASCII-ish protocol where you 
want to look at command names and CRLF-separated messages, you'd never want to 
look at an individual octet, stringish operations like split() will give you 
what you want.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Georg Brandl
On 19.05.2011 10:37, Stefan Behnel wrote:
 Xavier Morel, 19.05.2011 09:41:
 On 2011-05-19, at 07:28 , Georg Brandl wrote:
 On 19.05.2011 00:39, Greg Ewing wrote:
 If someone sees that
 
 some_var[3] == b'd'
 
 is true, and that
 
 some_var[3] == 100
 
 is also true, they might expect to be able to do things like
 
 n = b'd' + 1
 
 and get 101... or maybe b'e'...
 
 Maybe they should :)
 
 But why wouldn't they expect `b'de' + 1` to work as well in this case? If
 a 1-byte bytes is equivalent to an integer, why not an arbitrary one as
 well?
 
 The result of this must obviously be bde1.

To clarify my original one-liner: if bytes objects (but only one-char bytes
objects) equal integers, you should rightly expect to treat them as integers.

This is obviously *not* desirable from a strong-typing POV.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Terry Reedy

On 5/19/2011 3:49 AM, Nick Coghlan wrote:


It's a mental model problem. People try to think of bytes as
equivalent to 2.x str and that's just wrong, wrong, wrong. It's far
closer to array.array('c').


Or like C char arrays


Strings are basically *unique* in
returning a length 1 instance of themselves for indexing operations.


I still remember having to work that out and get used to it.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Ethan Furman

Nick Coghlan wrote:

On Thu, May 19, 2011 at 6:43 PM, Nick Coghlan ncogh...@gmail.com wrote:

For point 2, I'm personally +0 on the idea of having 1-element bytes
and bytearray objects delegate hashing and comparison operations to
the corresponding integer object. We have the power to make the
obvious code correct code, so let's do that. However, the implications
of the additional key collisions in value based containers may need to
be explored further.


Several folk have said that objects that compare equal must hash equal...

Why?  It's an honest question.  Here's what I have tried:

-- class Wierd():
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) ** 3)
...
-- one = Wierd(1)
-- two = Wierd(2)
-- three = Wierd(3)
-- one
Wierd object at 0x00BFE710
-- one == 1
True
-- one == 2
False
-- two == 2
True
-- three == 3
True
-- d = dict()
-- d[one] = '1'
-- d[two] = '2'
-- d[three] = '3'
-- d
{Wierd object at 0x00BFE710: '1',
 Wierd object at 0x00BFE870: '3',
 Wierd object at 0x00BFE830: '2'}
-- d[1] = '1.0'
-- d[2] = '2.0'
-- d[3] = '3.0'
-- d
{Wierd object at 0x00BFE870: '3',
 1: '1.0',
 2: '2.0',
 3: '3.0',
 Wierd object at 0x00BFE830: '2',
 Wierd object at 0x00BFE710: '1'}
-- d[2]
'2.0'
-- d[two]
'2'

This behavior matches what I was imagining for having
b'a' == 97.  They compare equal, yet remain distinct objects
for all other purposes.

If anybody has a link to or an explanation why equal values must be 
equal hashes I'm all ears.  My apologies in advance if this is an 
incredibly naive question.


~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Benjamin Peterson
2011/5/19 Ethan Furman et...@stoneleaf.us:
 If anybody has a link to or an explanation why equal values must be equal
 hashes I'm all ears.  My apologies in advance if this is an incredibly naive
 question.

https://secure.wikimedia.org/wikipedia/en/wiki/Hash_table


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Raymond Hettinger

On May 19, 2011, at 7:40 PM, Ethan Furman wrote:

 Several folk have said that objects that compare equal must hash equal...

And so do the docs:  
http://docs.python.org/dev/reference/datamodel.html#object.__hash__
, the only required property is that objects which compare equal have the same 
hash value.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Georg Brandl
On 18.05.2011 07:39, Greg Ewing wrote:
 Ethan Furman wrote:
 
 On the one hand we have the 'bytes are ascii data' type interface, and 
 on the other we have the 'bytes are a list of integers between 0 - 256' 
 interface.
 
 I think the weird part is that there exists a literal for
 writing a byte array as an ascii string, and furthermore
 that it's the *only* kind of literal available for bytes.
 
 Personally I think that the default literal syntax for
 bytes, and also the form produced by repr(), should have
 been something more neutral, such as hex, with the ascii
 form available for use when it makes sense. Currently if
 you want to write a bytes literal in hex, you have to
 say something like
 
 some_var = b'\xde\xad\xbe\xef'
 
 which is ugly and unreadable. Much nicer would be
 
 some_var = x'deadbeef'

We do have

  bytes.fromhex('deadbeef')

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Martin v. Löwis
 Is there code out there that is using this list of int's interface

Just in case this isn't clear yet: yes, certainly. Any non-trivial piece
of Python 3 code that has been written already (and there is some) will
have run into that issue.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Bill Janssen
Georg Brandl g.bra...@gmx.net wrote:

 We do have
 
   bytes.fromhex('deadbeef')

Sort of reminds me of Java's Integer.parseInt(), and not in a good way.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Ethan Furman

Greg Ewing wrote:

Ethan Furman wrote:

On the one hand we have the 'bytes are ascii data' type interface, and 
on the other we have the 'bytes are a list of integers between 0 - 
255' interface.


I think the weird part is that there exists a literal for
writing a byte array as an ascii string, and furthermore
that it's the *only* kind of literal available for bytes.


That is the point I was trying to make -- thank you for stating it more 
clearly than I managed to.  :)




Personally I think that the default literal syntax for
bytes, and also the form produced by repr(), should have
been something more neutral, such as hex,


Agreed.  It is surprising to extract an element out of bytes, and not 
end up with bytes, but with an int -- if the repr used something besides 
the plain ascii representation, this would not be an expectation.  For 
comparison, when one extracts an element out of a str one gets a str -- 
not the int representing the unicode code point.



with the ascii form available for use when it makes sense.

As for


-- some_other_var[3] == b'd'


there ought to be a literal for specifying an integer
using an ascii character, so you could say something like

  if some_other_var[3] == c'd':

which would be equivalent to

  if some_other_var[3] == ord(b'd')

but without the overhead of computing the value each time
at run time.


Given that we can't change the behavior of b'abc'[1], that would be 
better than what we have.


+1

~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Stephen J. Turnbull
Robert Collins writes:

  Its probably too late to change, but please don't try to argue that
  its correct: the continued confusion of folk running into this is
  evidence that confusion *is happening*. Treat that as evidence and
  think about how to fix it going forward.

Sorry, Rob, but you're just wrong here, and Nick is right.  It's
possible to improve Python 3, but not to fix it in this respect.
The Python 3 solution is correct, the Python 2 approach is not.
There's no way to avoid discontinuity and confusion here.

Confusion is indeed happening, but it's real confusion in the way
people think about the problem space, not a language design cockup.
The problem can't be solved by embedding ASCII in Unicode, because
non-ASCII bytes don't have a canonical embedding in Unicode.  Ie, the
situation is inherently confusing.  You can't wish it away, you can
only choose to impose more or less of it on particular constituencies.

Now, it's quite possible that there are other correct approaches that
allow straightforward manipulation of non-ASCII text, but I don't know
what they are, and I don't know anybody else who does.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread R. David Murray
On Thu, 19 May 2011 01:16:44 +0900, Stephen J. Turnbull step...@xemacs.org 
wrote:
 Robert Collins writes:
 
   Its probably too late to change, but please don't try to argue that
   its correct: the continued confusion of folk running into this is
   evidence that confusion *is happening*. Treat that as evidence and
   think about how to fix it going forward.
 
 Sorry, Rob, but you're just wrong here, and Nick is right.  It's
 possible to improve Python 3, but not to fix it in this respect.
 The Python 3 solution is correct, the Python 2 approach is not.
 There's no way to avoid discontinuity and confusion here.
 
 Confusion is indeed happening, but it's real confusion in the way
 people think about the problem space, not a language design cockup.

Note that the more common idiom (not that I can measure it, mind)
when dealing with byte strings is something analogous to

if my_byte_string[i:i+1] == b'x':

rather than

if my_byte_string[i] == 170:

and the former is a lot more readable than the latter, even though
you have to stare at the slice for a couple seconds the first time
you encounter it to realize what is going on.

So *something* is wrong with Python3's approach.  Python2 was wronger,
though :)

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Martin v. Löwis
 Note that the more common idiom (not that I can measure it, mind)
 when dealing with byte strings is something analogous to
 
 if my_byte_string[i:i+1] == b'x':
 
 rather than
 
 if my_byte_string[i] == 170:

FWIW, Another spelling of this is

  if my_byte_string[i] == ord(b'x')

From a readability point, it's in the same category as the first one,
but less twisted.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Eric Smith
On 05/18/2011 12:16 PM, Stephen J. Turnbull wrote:
 Robert Collins writes:
 
   Its probably too late to change, but please don't try to argue that
   its correct: the continued confusion of folk running into this is
   evidence that confusion *is happening*. Treat that as evidence and
   think about how to fix it going forward.
 
 Sorry, Rob, but you're just wrong here, and Nick is right.  It's
 possible to improve Python 3, but not to fix it in this respect.
 The Python 3 solution is correct, the Python 2 approach is not.
 There's no way to avoid discontinuity and confusion here.

I don't think there's any connection between the way 2.x confused text
strings and binary data (which certainly needed addressing) with the way
that 3.x returns a different type for byte_str[i] than it does for
byte_str[i:i+1]. I think it's the latter that's confusing to people.
There's no particular requirement for different types that's needed to
fix the byte/str problem.

And of course it's too late to make any change to this.

Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Ethan Furman

Ethan Furman wrote:

Greg Ewing wrote:

As for


-- some_other_var[3] == b'd'


there ought to be a literal for specifying an integer
using an ascii character, so you could say something like

  if some_other_var[3] == c'd':

which would be equivalent to

  if some_other_var[3] == ord(b'd')

but without the overhead of computing the value each time
at run time.


Given that we can't change the behavior of b'abc'[1], that would be 
better than what we have.


+1


Here's another thought, that perhaps is not backwards-incompatible...

some_var[3] == b'd'

At some point, the bytes class' __eq__ will be called -- is there a 
reason why we cannot have


1) a check to see if the bytes instance is length 1
2) a check to see if
   i) the other object is an int, and
   2) 0 = other_obj  256
3) if 1 and 2, make the comparison instead of returning NotImplemented?

This makes sense to me -- after all, the bytes class is an array of ints 
in range(256);  it is a special case, but doesn't feel any more special 
than passing an int into bytes() giving a string of that many null 
bytes; and it would get rid of the, in my opinion ugly, idiom of


some_var[i:i+1] == b'd'

It would also not require a new literal syntax.

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Georg Brandl
On 18.05.2011 21:06, Martin v. Löwis wrote:
 Note that the more common idiom (not that I can measure it, mind)
 when dealing with byte strings is something analogous to
 
 if my_byte_string[i:i+1] == b'x':
 
 rather than
 
 if my_byte_string[i] == 170:
 
 FWIW, Another spelling of this is
 
   if my_byte_string[i] == ord(b'x')
 
From a readability point, it's in the same category as the first one,
 but less twisted.

Probably more twisted:

if my_byte_string[i] == b'x'[0]:

:)

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Ethan Furman

Ethan Furman wrote:

[...]

Also posted to Python-Ideas.

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Martin v. Löwis
 Here's another thought, that perhaps is not backwards-incompatible...
 
 some_var[3] == b'd'
 
 At some point, the bytes class' __eq__ will be called -- is there a
 reason why we cannot have
 
 1) a check to see if the bytes instance is length 1
 2) a check to see if
i) the other object is an int, and
2) 0 = other_obj  256
 3) if 1 and 2, make the comparison instead of returning NotImplemented?

Immutable objects that compare equal should hash equal;
so we would also have to change the hashing of byte strings. Not sure
whether that, in turn, has undesirable consequences.

In addition, equality should be transitive, so b'A' == 65.0.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Ethan Furman

Martin v. Löwis wrote:

Here's another thought, that perhaps is not backwards-incompatible...

some_var[3] == b'd'

At some point, the bytes class' __eq__ will be called -- is there a
reason why we cannot have

1) a check to see if the bytes instance is length 1
2) a check to see if
   i) the other object is an int, and
   2) 0 = other_obj  256
3) if 1 and 2, make the comparison instead of returning NotImplemented?


Immutable objects that compare equal should hash equal;
so we would also have to change the hashing of byte strings. Not sure
whether that, in turn, has undesirable consequences.


I thought it was the other-way-round -- if they hash equal, they should 
compare equal?  Or is this just for immutables?



In addition, equality should be transitive, so b'A' == 65.0.


I'm not sure what you're getting at...  we could certainly have step 2 
check for a number instead of an int, and then step 3 could extract the 
one element, giving an int, and then let that int compare itself with 
the other number, whether it be int, float, fraction, what-have-you.



~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Terry Reedy

On 5/18/2011 4:10 PM, Ethan Furman wrote:

Ethan Furman wrote:

[...]

Also posted to Python-Ideas.


Good. That is where it should have gone in the first place, as this is 
about ideas not yet even in the PEP stage.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Martin v. Löwis
 Immutable objects that compare equal should hash equal;
 so we would also have to change the hashing of byte strings. Not sure
 whether that, in turn, has undesirable consequences.
 
 I thought it was the other-way-round -- if they hash equal, they should
 compare equal?

No no no. If they hash equal, it could just be a hash collision -
objects of a class could all hash to 42, if they wanted to.
Dictionaries require the property I mentioned. If they compare
equal, but hash differently, a dictionary lookup would fail to
find the key.

 In addition, equality should be transitive, so b'A' == 65.0.
 
 I'm not sure what you're getting at...

That it is counter-intuitive to have a bytes object compare equal
to a floating-point number.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Greg Ewing

Georg Brandl wrote:


We do have

  bytes.fromhex('deadbeef')


But again, there is a run-time overhead to this.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Greg Ewing

Eric Smith wrote:


And of course it's too late to make any change to this.


It's too late to change the meaning of b'...', but is it
really too late to introduce an x'...' literal and change
the repr() to produce it?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Greg Ewing

Ethan Furman wrote:


some_var[3] == b'd'

1) a check to see if the bytes instance is length 1
2) a check to see if
   i) the other object is an int, and
   2) 0 = other_obj  256
3) if 1 and 2, make the comparison instead of returning NotImplemented?


It might seem convenient, but I'd worry that it would lead to
even more confusion in other ways. If someone sees that

   some_var[3] == b'd'

is true, and that

   some_var[3] == 100

is also true, they might expect to be able to do things
like

   n = b'd' + 1

and get 101... or maybe b'e'...

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Eric Smith
On 5/18/2011 6:32 PM, Greg Ewing wrote:
 Eric Smith wrote:
 
 And of course it's too late to make any change to this.
 
 It's too late to change the meaning of b'...', but is it
 really too late to introduce an x'...' literal and change
 the repr() to produce it?

My this was the different types returned by b[i] and b[i:i+1].

Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Robert Collins
On Thu, May 19, 2011 at 4:16 AM, Stephen J. Turnbull step...@xemacs.org wrote:
 Robert Collins writes:

   Its probably too late to change, but please don't try to argue that
   its correct: the continued confusion of folk running into this is
   evidence that confusion *is happening*. Treat that as evidence and
   think about how to fix it going forward.

 Sorry, Rob, but you're just wrong here, and Nick is right.  It's
 possible to improve Python 3, but not to fix it in this respect.
 The Python 3 solution is correct, the Python 2 approach is not.
 There's no way to avoid discontinuity and confusion here.

The top level description: 'bytes is a different type to text[unicode]
and casting between them must be explicit' is completely correct in
Python 3: I didn't (and have never AFAIK) quibbled about that.

Thats separate to the implementation issues I have mentioned in this
thread and previous.

Arguing that implicit casting is a good idea isn't what I was doing,
nor what Nick was rebutting, AFAICT.

-Rob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-18 Thread Georg Brandl
On 19.05.2011 00:39, Greg Ewing wrote:
 Ethan Furman wrote:
 
 some_var[3] == b'd'
 
 1) a check to see if the bytes instance is length 1
 2) a check to see if
i) the other object is an int, and
2) 0 = other_obj  256
 3) if 1 and 2, make the comparison instead of returning NotImplemented?
 
 It might seem convenient, but I'd worry that it would lead to
 even more confusion in other ways. If someone sees that
 
 some_var[3] == b'd'
 
 is true, and that
 
 some_var[3] == 100
 
 is also true, they might expect to be able to do things
 like
 
 n = b'd' + 1
 
 and get 101... or maybe b'e'...

Maybe they should :)

Georg


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Benjamin Peterson
2011/5/17 Ethan Furman et...@stoneleaf.us:
 Considering that ord() still works fine, I'm not sure why it was done this
 way.

I agree that this change was unfortunate and not too useful in practice.


 Is there code out there that is using this list of int's interface, or is
 there time to make changes to bytes?

I don't doubt there is, and I'm afraid it's far to late to change this.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Raymond Hettinger

On May 17, 2011, at 5:27 PM, Ethan Furman wrote:

 The bytes type in Python 3 does not feel very consistent.
 
 For example:
 
 -- some_var = 'abcdef'
 -- some_var
 'abcdef'
 -- some_var[3]
 'd'
 -- some_other_var = b'abcdef'
 -- some_other_var
 b'abcdef'
 -- some_other_var[3]
 100
 
 
 On the one hand we have the 'bytes are ascii data' type interface,

This is incidental.  Bytes can and often do contain data with non-ascii encoded 
text,  plain binary data, or structs, or raw data read off a disk, etc.

 and on the other we have the 'bytes are a list of integers between 0 - 256' 
 interface.  And trying to use the two is not intuitive:
 
 -- some_other_var[3] == b'd'
 False
 
 When I'm parsing a .dbf file and extracting field types from the byte stream, 
 I'm not thinking, okay, 67 is a Character field -- what I'm thinking is, 
 b'C' is a Character field.
 
 Considering that ord() still works fine, I'm not sure why it was done this 
 way.
 
 Is there code out there that is using this list of int's interface,

Yes.

 or is there time to make changes to bytes?

No.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Nick Coghlan
On Wed, May 18, 2011 at 8:27 AM, Ethan Furman et...@stoneleaf.us wrote:
 On the one hand we have the 'bytes are ascii data' type interface, and on
 the other we have the 'bytes are a list of integers between 0 - 256'
 interface.

No. Bytes are a list of integers between 0-256. End of story. Using
them to represent text as well was precisely the problem with 2.x
8-bit strings, since the boundaries got blurred.

However, as a matter of practicality, many byte-oriented protocols use
ASCII to make elements of the protocol readable by humans. The
text-like elements of the bytes and bytearray types are a concession
to the existence of those protocols. However, that doesn't make them
text - they're still binary data streams. If you want to treat them as
text, convert them to str objects first (e.g. that's what
urlib.urlparse does internally in order to operate on bytes and
bytearray instances).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Robert Collins
On Wed, May 18, 2011 at 3:13 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On Wed, May 18, 2011 at 8:27 AM, Ethan Furman et...@stoneleaf.us wrote:
 On the one hand we have the 'bytes are ascii data' type interface, and on
 the other we have the 'bytes are a list of integers between 0 - 256'
 interface.

 No. Bytes are a list of integers between 0-256. End of story. Using
 them to represent text as well was precisely the problem with 2.x
 8-bit strings, since the boundaries got blurred.

 However, as a matter of practicality, many byte-oriented protocols use
 ASCII to make elements of the protocol readable by humans. The
 text-like elements of the bytes and bytearray types are a concession
 to the existence of those protocols. However, that doesn't make them
 text - they're still binary data streams. If you want to treat them as
 text, convert them to str objects first (e.g. that's what
 urlib.urlparse does internally in order to operate on bytes and
 bytearray instances).

This is a not a useful argument - its an implementation choice in
Python 3, and urlparse converting bytes to 'str' to operate on them is
at best a kludge - you're forcing 5 times the storage (the original
bytes + 4 bytes-per-byte when its decoded into unicode) to work on
something which is defined as a BNF * that uses ascii *.

The Python 2 confusion was deplorable, but it doesn't make the Python
3 situation better: its different, but still very awkward for people
to write code that is correct and fast in.

Its probably too late to change, but please don't try to argue that
its correct: the continued confusion of folk running into this is
evidence that confusion *is happening*. Treat that as evidence and
think about how to fix it going forward.

_Rob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Nick Coghlan
On Wed, May 18, 2011 at 1:23 PM, Robert Collins
robe...@robertcollins.net wrote:
 The Python 2 confusion was deplorable, but it doesn't make the Python
 3 situation better: its different, but still very awkward for people
 to write code that is correct and fast in.

When Python 3 goes wrong, it raises exceptions or executes the wrong
control flow. That's a vast improvement over silently corrupting the
data stream the way that 2.x does.

If it really bothers anyone, they should feel free to implement and
promote their own ascii data type on PyPI. If it is explicitly
restricted to 7 bit characters, it may even avoid many of the problems
of silent corruption that the 2.x str had. Speculation on python-dev
isn't going to be convincing here, though: only code in real use will
be effective on that front.

As far as the memory and runtime overhead goes, yes, that's a real
problem (indeed, that overhead is *why* bytes and bytearray have as
many str-like features as they do). PEP 393 is intended to at least
alleviate the memory burden of the Unicode text.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Greg Ewing

Ethan Furman wrote:

On the one hand we have the 'bytes are ascii data' type interface, and 
on the other we have the 'bytes are a list of integers between 0 - 256' 
interface.


I think the weird part is that there exists a literal for
writing a byte array as an ascii string, and furthermore
that it's the *only* kind of literal available for bytes.

Personally I think that the default literal syntax for
bytes, and also the form produced by repr(), should have
been something more neutral, such as hex, with the ascii
form available for use when it makes sense. Currently if
you want to write a bytes literal in hex, you have to
say something like

   some_var = b'\xde\xad\xbe\xef'

which is ugly and unreadable. Much nicer would be

   some_var = x'deadbeef'

As for


-- some_other_var[3] == b'd'


there ought to be a literal for specifying an integer
using an ascii character, so you could say something like

  if some_other_var[3] == c'd':

which would be equivalent to

  if some_other_var[3] == ord(b'd')

but without the overhead of computing the value each time
at run time.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Greg Ewing

Robert Collins wrote:

urlparse converting bytes to 'str' to operate on them is
at best a kludge - you're forcing 5 times the storage (the original
bytes + 4 bytes-per-byte when its decoded into unicode)


That is itself an implementation detail of current Python,
though, due to it only having one internal representation of
unicode.

In principle there could be a form of str that keeps its
data encoded in latin1, in which case constructing it from
a byte string could simply involve storing a pointer to the
original bytes data.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-17 Thread Glenn Linderman

On 5/17/2011 10:39 PM, Greg Ewing wrote:

Personally I think that the default literal syntax for
bytes, and also the form produced by repr(), should have
been something more neutral, such as hex, with the ascii
form available for use when it makes sense.



Much nicer would be

   some_var = x'deadbeef'

As for


-- some_other_var[3] == b'd'


there ought to be a literal for specifying an integer
using an ascii character, so you could say something like

  if some_other_var[3] == c'd':

which would be equivalent to

  if some_other_var[3] == ord(b'd')

but without the overhead of computing the value each time
at run time.


+1

Seems this could be added compatibly?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com