[
https://issues.apache.org/jira/browse/THRIFT-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ilya Maykov updated THRIFT-1189:
--------------------------------
Attachment: patch-THRIFT-1189-combined.txt
Single combined patch, and this time against the SVN repo instead of our own
git copy.
Patch is against svn.apache.org/repos/asf/thrift/tags/thrift-0.6.1
> Ruby deserializer speed improvements
> ------------------------------------
>
> Key: THRIFT-1189
> URL: https://issues.apache.org/jira/browse/THRIFT-1189
> Project: Thrift
> Issue Type: Improvement
> Components: Ruby - Library
> Affects Versions: 0.6.1
> Environment: OS X 10.6 i686 / Linux x86_64
> Ruby 1.8.7-p334 / Ruby 1.9.2-p180
> Reporter: Ilya Maykov
> Attachments: patch-THRIFT-1189-2.txt, patch-THRIFT-1189-3.txt,
> patch-THRIFT-1189-combined.txt, patch-THRIFT-1189.txt, thrift_perf_test.tar.gz
>
>
> I have a patch to the Ruby libraries that greatly increases deserializer
> speed. We've been running our production systems at Ooyala with this patch
> for weeks but it was previously just a standalone file which changed some
> methods inside Thrift code and we loaded it after requiring thrift. Over the
> weekend, I ported it into a proper patch against the 0.6.1 tag and would like
> to commit it back.
> I originally wrote this while trying to speed up some code we have that has
> to deserialize a lot of thrift objects. I ran it under ruby-prof and noticed
> that a huge amount of time was spent inside thrift deserialization code.
> Digging deeper still, I saw a lot of time spent in String allocation and copy
> methods. It turns out that there are several low-hanging fruit:
> 1) XProtocol#read_byte() methods end up calling read_all(1), getting back a
> string of size 1, and converting it to a byte. This is an unnecessary string
> alloc + copy that's pretty easy to get around. The patch does this by adding
> a read_byte method to the XTransport classes. The transports that have
> buffering of some kind (BufferedTransport, FramedTransport,
> MemoryBufferTransport) can look up the byte, convert to unsigned, and return
> it without doing the extra alloc + copy.
> 2) the BaseProtocol#read_all() method always allocates an empty buffer
> string, reads bytes from the underlying transport, then appends the result to
> the buffer. This extra string alloc + copy is also removed in my patch as
> it's not needed.
> 3) Thrift::Struct#hash() is inefficient - it allocates an array and copies
> all struct fields into it. Replaced with logic copied from Apache's Java
> HashCodeBuilder class.
> I've built a gem locally (i gave it version number 0.6.1.1) and wrote a
> simple benchmark to test the changes. The benchmark creates a struct,
> serializes it to a binary string, then deserializes it in a loop 10000 times
> (per protocol). Here are the results (all times are in seconds):
> || Benchmark|| r1.8.7-p334/thrift-0.6.0 || r1.8.7-p334/thrift-0.6.1.1 ||
> r1.9.2-p180/thrift-0.6.0 || r1.9.2-p180/thrift-0.6.1.1 ||
> | Deserialization: BinaryProtocol | 15.76 | 9.97 | 8.23 | 5.39 |
> | Deserialization: BinaryProtocolAccelerated | 11.65 | 4.14 | 5.73 | 3.15 |
> | Deserialization: CompactProtocol | 12.70 | 3.65 | 6.48 | 2.75 |
> | Hashing | 7.39 | 5.99 | 2.61 | 2.23 |
> | Equality | 3.84 | 2.93 | 1.24 | 0.96 |
> I will be attaching the patch and benchmark code shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira