#2116: Application layer protocol for transfering RPC  messages + utf8 decoding
error
-------------------+--------------------------------------------------------
 Reporter:  bro    |       Owner:        
     Type:  patch  |      Status:  new   
 Priority:  major  |   Milestone:  Future
Component:  other  |     Version:  1.3.5 
 Keywords:         |  
-------------------+--------------------------------------------------------

Comment(by bro):

 andar, after reading your comments I tested this out more thoroughly. The
 problem wasn't that easy to reproduce, but I think I've got it.

 As far as I can understand, the problem is caused by two things, which
 results in errors with the current protocol - ''The rencode-must-fail
 Protocol''. The problem isn't ''caused'' by zlib failing, as I thought
 when writing the ticket, but it seems zlib fails because of another
 reason.

 == The rencode-must-fail Protocol ==

 1) If two messages of 50 bytes are received as 100 bytes, ''zlib'' will
 decompress the first 50 bytes, and put the next 50 bytes in the
 ''unused_data'' buffer.
 This works well, and the protocol depends on this to work.

 However, if only the first 25 bytes are received of the 100 bytes, zlib
 will not raise any errors, but decompress the 25 bytes. That isn't a
 problem with zlib, but it is the first problem with the protocol.

 2) When zlib doesn't fail when decompressing only 25 bytes, it's up to
 rencode to fail (for the protocol to function), and it does. It will not
 successfully decode only 25 bytes (it's more after being decompressed, but
 lets ignore that) of 50, so it raises an error and the protocol will then
 wait for more data.
 This also works well...usually.

 However, if less bytes are are received, e.g. 48, '''rencode might not
 complain''' when decoding the 48 bytes, even though it's missing 2 bytes.

 The next time data arrives (52 bytes where the first 2 is from the last
 message), it will not successfully decompress as the next message starts 2
 bytes later, and spits out:
 {{{
 Error -3 while decompressing: incorrect header check
 }}}

 ----

 When sending a message of 50 bytes, it will of course never receive just
 48 bytes, and that is why this protocol works in most cases. Usually the
 messages aren't split, and when they are, they're split at a place so that
 rencode fails (which makes sure it will save the data and wait for more).

 But when multiple messages are sent, and split by various reasons, and the
 split is at a place (near the end of the message) where rencode won't
 complain, it will always fail to decompress later messages, and a
 reconnect to the daemon is required.

 I've created a test ''_test_rencode_fail_protocol'' which illustrates this
 problem.
 Testing with different values for the ''packet_size'' variable you'll see
 that rencode will accept a few bytes bytes '''less''' than the actual
 message length. Using a value of 53 or greater for the packet_size
 variable will work because the first packet is 53 bytes. Using 49-52 will
 cause the test to fail (because rencode doesn't fail as the protocol
 depends upon).

 Also illustrated by:

 {{{
 >>> import rencode
 >>> data = '(0, 1, {"key": "value"})'
 >>> dump = rencode.dumps(eval(data))
 >>> dump = dump[:-2]
 >>> print rencode.loads(dump)
 (0, 1, {u'key': u'val\x00S'})
 }}}

 I tested the same with pickle which also seems to accept a slightly
 truncated message.


 == Performance ==

 I've timed the old code with 1600 torrents, and it takes a long time to
 process the torrent list. As expected the problem isn't that zlib uses a
 lot of time, it's rencode. When timing all calls on zlib.decompress and
 rencode.loads for the torrent list message, which was received in about 29
 packets meaning 29 calls to zlib.decompress and rencode.loads,
 zlib.decompress used in total 0.53 seconds, while rencode.loads used about
 12 seconds.
 With this patch, it is reduced to only one call to zlib.decompress and
 rencode.load which takes about 0.7 seconds with my torrent list.

 In regards to the comments about saving space with pickle, it's actually
 in practice not true. Pickle will use less space than rencode with the
 same data, but zlib is able to compress the rencode dump more than the
 pickle dump, so in practice, rencode+zlib uses less space on the wire ;-)

 == rencode ==

 I tried the Cython version of rencode but it fails as well. I found out
 that the decoding issue is caused by a "bad" character in the
 ''tracker_status'' field, more precisely '''\xe5''' which is the character
 å.

 {{{
 >>> import rencode, pickle
 >>> data = eval("['String with \xe5']")
 >>> dump = pickle.dumps(data)
 >>> pickle.loads(dump)
 ['String with \xe5']
 >>> dump = rencode.dumps(data)
 >>> rencode.loads(dump)
 decode error: 'utf8' codec can't decode byte 0xe5 in position 12:
 unexpected end of data
 input: String with
 ...
 ...
 TypeError: decoder must return a tuple (object,integer)
 }}}

 Updated version of the patch using rencode (It's also simplified a bit):

 https://github.com/bendikro/deluge/commits/1.3-dev

-- 
Ticket URL: <http://dev.deluge-torrent.org/ticket/2116#comment:3>
Deluge <http://deluge-torrent.org/>
Deluge project

-- 
You received this message because you are subscribed to the Google Groups 
"Deluge Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/deluge-dev?hl=en.

Reply via email to