#2116: Application layer protocol for transfering RPC messages + utf8 decoding
error
-------------------+--------------------------------------------------------
Reporter: bro | Owner:
Type: patch | Status: new
Priority: major | Milestone: Future
Component: other | Version: 1.3.5
Keywords: |
-------------------+--------------------------------------------------------
Comment(by bro):
andar, after reading your comments I tested this out more thoroughly. The
problem wasn't that easy to reproduce, but I think I've got it.
As far as I can understand, the problem is caused by two things, which
results in errors with the current protocol - ''The rencode-must-fail
Protocol''. The problem isn't ''caused'' by zlib failing, as I thought
when writing the ticket, but it seems zlib fails because of another
reason.
== The rencode-must-fail Protocol ==
1) If two messages of 50 bytes are received as 100 bytes, ''zlib'' will
decompress the first 50 bytes, and put the next 50 bytes in the
''unused_data'' buffer.
This works well, and the protocol depends on this to work.
However, if only the first 25 bytes are received of the 100 bytes, zlib
will not raise any errors, but decompress the 25 bytes. That isn't a
problem with zlib, but it is the first problem with the protocol.
2) When zlib doesn't fail when decompressing only 25 bytes, it's up to
rencode to fail (for the protocol to function), and it does. It will not
successfully decode only 25 bytes (it's more after being decompressed, but
lets ignore that) of 50, so it raises an error and the protocol will then
wait for more data.
This also works well...usually.
However, if less bytes are are received, e.g. 48, '''rencode might not
complain''' when decoding the 48 bytes, even though it's missing 2 bytes.
The next time data arrives (52 bytes where the first 2 is from the last
message), it will not successfully decompress as the next message starts 2
bytes later, and spits out:
{{{
Error -3 while decompressing: incorrect header check
}}}
----
When sending a message of 50 bytes, it will of course never receive just
48 bytes, and that is why this protocol works in most cases. Usually the
messages aren't split, and when they are, they're split at a place so that
rencode fails (which makes sure it will save the data and wait for more).
But when multiple messages are sent, and split by various reasons, and the
split is at a place (near the end of the message) where rencode won't
complain, it will always fail to decompress later messages, and a
reconnect to the daemon is required.
I've created a test ''_test_rencode_fail_protocol'' which illustrates this
problem.
Testing with different values for the ''packet_size'' variable you'll see
that rencode will accept a few bytes bytes '''less''' than the actual
message length. Using a value of 53 or greater for the packet_size
variable will work because the first packet is 53 bytes. Using 49-52 will
cause the test to fail (because rencode doesn't fail as the protocol
depends upon).
Also illustrated by:
{{{
>>> import rencode
>>> data = '(0, 1, {"key": "value"})'
>>> dump = rencode.dumps(eval(data))
>>> dump = dump[:-2]
>>> print rencode.loads(dump)
(0, 1, {u'key': u'val\x00S'})
}}}
I tested the same with pickle which also seems to accept a slightly
truncated message.
== Performance ==
I've timed the old code with 1600 torrents, and it takes a long time to
process the torrent list. As expected the problem isn't that zlib uses a
lot of time, it's rencode. When timing all calls on zlib.decompress and
rencode.loads for the torrent list message, which was received in about 29
packets meaning 29 calls to zlib.decompress and rencode.loads,
zlib.decompress used in total 0.53 seconds, while rencode.loads used about
12 seconds.
With this patch, it is reduced to only one call to zlib.decompress and
rencode.load which takes about 0.7 seconds with my torrent list.
In regards to the comments about saving space with pickle, it's actually
in practice not true. Pickle will use less space than rencode with the
same data, but zlib is able to compress the rencode dump more than the
pickle dump, so in practice, rencode+zlib uses less space on the wire ;-)
== rencode ==
I tried the Cython version of rencode but it fails as well. I found out
that the decoding issue is caused by a "bad" character in the
''tracker_status'' field, more precisely '''\xe5''' which is the character
å.
{{{
>>> import rencode, pickle
>>> data = eval("['String with \xe5']")
>>> dump = pickle.dumps(data)
>>> pickle.loads(dump)
['String with \xe5']
>>> dump = rencode.dumps(data)
>>> rencode.loads(dump)
decode error: 'utf8' codec can't decode byte 0xe5 in position 12:
unexpected end of data
input: String with
...
...
TypeError: decoder must return a tuple (object,integer)
}}}
Updated version of the patch using rencode (It's also simplified a bit):
https://github.com/bendikro/deluge/commits/1.3-dev
--
Ticket URL: <http://dev.deluge-torrent.org/ticket/2116#comment:3>
Deluge <http://deluge-torrent.org/>
Deluge project
--
You received this message because you are subscribed to the Google Groups
"Deluge Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/deluge-dev?hl=en.