Re: [Deluge] #2116: Application layer protocol for transfering RPC messages + utf8 decoding error

Deluge Wed, 27 Jun 2012 15:08:34 -0700

#2116: Application layer protocol for transfering RPC  messages + utf8 decoding
error
-------------------+--------------------------------------------------------
 Reporter:  bro    |       Owner:        
     Type:  patch  |      Status:  new   
 Priority:  major  |   Milestone:  Future
Component:  other  |     Version:  1.3.5 
 Keywords:         |  
-------------------+--------------------------------------------------------


Comment(by andar):

 Great analysis work.

 So it looks like we have two problems: an issue with the RPC messaging and
 an issue with rencode.  It looks like you've solved the first one and your
 reasoning makes sense, so I'll work on getting this applied to master.

 I've taken a look at the second problem involving rencode and I at least
 understand why it's happening, but I'm still not sure on how to go about
 fixing it.  With rencode we expect all strings (byte strings) to be either
 utf8 or ascii encoded (ascii is a subset of utf8 which is why it works).
 If a unicode object is passed into rencode, it will first encode it into a
 utf8 bytestring.  During a decode of a string, rencode will attempt to
 decode the string as utf8 so that it will return a unicode object.

 {{{
 >>> data = u"\xe5"
 >>> print data
 å
 >>> rencode.dumps(data)
 '\x82\xc3\xa5'
 >>> rencode.loads(rencode.dumps(data))
 u'\xe5'
 }}}

 {{{
 >>> data = "foo"
 >>> rencode.dumps(data)
 '\x83foo'
 >>> rencode.loads(rencode.dumps(data))
 u'foo'
 }}}

 When the string is passed in as a unicode object, rencode behaves as
 expected returning an unicode object when doing a loads().

 The problem arises when you pass in an string that is neither ascii or
 utf8 encoded and then try to loads() from rencode.

 {{{
 >>> data = "\xe5"
 >>> rencode.dumps(data)
 '\x81\xe5'
 >>> rencode.loads(rencode.dumps(data))
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "rencode.pyx", line 498, in rencode._rencode.loads
 (rencode/rencode.c:5439)
   File "rencode.pyx", line 466, in rencode._rencode.decode
 (rencode/rencode.c:5131)
   File "rencode.pyx", line 386, in rencode._rencode.decode_fixed_str
 (rencode/rencode.c:4159)
   File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 0:
 unexpected end of data
 }}}

 The dumps() works just fine because rencode does nothing to bytestrings
 passed in as it expects these to be proper ascii or utf8 encoded strings,
 but when it comes to the loads() rencode will attempt to decode the
 bytestring as utf8 to produce a unicode object.

 A couple options to fix this:

   * Don't. Simply enforce the fact that we should be using proper utf8
 strings and fix the source of these malformed strings.

   * Allow the use of different encoded bytestrings by not attempting a
 utf8 decode during the loads().  This means that if you pass a unicode
 object into rencode.dumps(), you will not get a unicode object out on the
 subsequent loads() but rather a bytestring in an unknown encoding.

 I'm really not sure if the latter option would have any effect on Deluge
 or not.  Quite frankly our handling of string encodings across the board
 is pretty messed up so it's a bit scary changing something as fundamental
 as this.  That being said, if you have tried using pickle and it has
 worked for you, this may turn out ok for us as well, as I assume this is
 how pickle approaches encoded bytestrings.  On the other hand, I kind of
 like the idea of enforcing the use of utf8 within Deluge as we really
 shouldn't be using any other encoding for anything.

 Anybody have any thoughts on this?  I suppose we could do some tests and
 see if removing the string decoding has any real effects.

-- 
Ticket URL: <http://dev.deluge-torrent.org/ticket/2116#comment:4>
Deluge <http://deluge-torrent.org/>
Deluge project

-- 
You received this message because you are subscribed to the Google Groups 
"Deluge Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/deluge-dev?hl=en.

Re: [Deluge] #2116: Application layer protocol for transfering RPC messages + utf8 decoding error

Reply via email to