[jira] [Commented] (AVRO-2281) Performance improvement in ruby binary decoder and encoder

ASF GitHub Bot (JIRA) Fri, 07 Dec 2018 09:09:30 -0800


    [ 
https://issues.apache.org/jira/browse/AVRO-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713096#comment-16713096
 ]


ASF GitHub Bot commented on AVRO-2281:
--------------------------------------

dkulp closed pull request #401: AVRO-2281: Optimize ruby binary encoder/decoder
URL: https://github.com/apache/avro/pull/401
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/lang/ruby/lib/avro/io.rb b/lang/ruby/lib/avro/io.rb
index 5961a4802..31107217a 100644
--- a/lang/ruby/lib/avro/io.rb
+++ b/lang/ruby/lib/avro/io.rb
@@ -43,7 +43,7 @@ def initialize(reader)
       end
 
       def byte!
-        @reader.read(1).unpack('C').first
+        @reader.readbyte
       end
 
       def read_null
@@ -76,7 +76,7 @@ def read_float
         # The float is converted into a 32-bit integer using a method
         # equivalent to Java's floatToIntBits and then encoded in
         # little-endian format.
-        @reader.read(4).unpack('e')[0]
+        read_and_unpack(4, 'e'.freeze)
       end
 
       def read_double
@@ -84,7 +84,7 @@ def read_double
         # The double is converted into a 64-bit integer using a method
         # equivalent to Java's doubleToLongBits and then encoded in
         # little-endian format.
-        @reader.read(8).unpack('E')[0]
+        read_and_unpack(8, 'E'.freeze)
       end
 
       def read_bytes
@@ -97,7 +97,7 @@ def read_string
         # A string is encoded as a long followed by that many bytes of
         # UTF-8 encoded character data.
         read_bytes.tap do |string|
-          string.force_encoding("UTF-8") if string.respond_to? :force_encoding
+          string.force_encoding('UTF-8'.freeze) if string.respond_to? 
:force_encoding
         end
       end
 
@@ -144,6 +144,23 @@ def skip_string
       def skip(n)
         reader.seek(reader.tell() + n)
       end
+
+      private
+
+      # Optimize unpacking strings when `unpack1` is available (ruby >= 2.4)
+      if String.instance_methods.include?(:unpack1)
+
+        def read_and_unpack(byte_count, format)
+          @reader.read(byte_count).unpack1(format)
+        end
+
+      else
+
+        def read_and_unpack(byte_count, format)
+          @reader.read(byte_count).unpack(format)[0]
+        end
+
+      end
     end
 
     # Write leaf values
@@ -188,7 +205,7 @@ def write_long(n)
       # equivalent to Java's floatToIntBits and then encoded in
       # little-endian format.
       def write_float(datum)
-        @writer.write([datum].pack('e'))
+        @writer.write([datum].pack('e'.freeze))
       end
 
       # A double is written as 8 bytes.
@@ -196,7 +213,7 @@ def write_float(datum)
       # equivalent to Java's doubleToLongBits and then encoded in
       # little-endian format.
       def write_double(datum)
-        @writer.write([datum].pack('E'))
+        @writer.write([datum].pack('E'.freeze))
       end
 
       # Bytes are encoded as a long followed by that many bytes of data.
@@ -208,7 +225,7 @@ def write_bytes(datum)
       # A string is encoded as a long followed by that many bytes of
       # UTF-8 encoded character data
       def write_string(datum)
-        datum = datum.encode('utf-8') if datum.respond_to? :encode
+        datum = datum.encode('utf-8'.freeze) if datum.respond_to? :encode
         write_bytes(datum)
       end
 


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Performance improvement in ruby binary decoder and encoder
> ----------------------------------------------------------
>
>                 Key: AVRO-2281
>                 URL: https://issues.apache.org/jira/browse/AVRO-2281
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: ruby
>    Affects Versions: 1.9.0
>            Reporter: Kyle Phelps
>            Priority: Minor
>
> The ruby binary decoder has some inefficient memory usage patterns. The 
> decoding process relies on `unpack` quite heavily, but it allocates an array 
> that is not used as we just access the first element of the result. In ruby 
> 2.4 we can use the optimized version of this, `unpack1` which avoids 
> allocating the unused array. In `byte!`, we can go one step further and just 
> use the `readbyte` method provided by the IO module - this improves the 
> performance of `byte!` by about 50%. Additionally, there are a few strings in 
> the encoder and decoder that should be frozen to reduce unnecessary string 
> allocations.
>  
> With these changes, I've seen about a 20% performance improvement when 
> decoding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AVRO-2281) Performance improvement in ruby binary decoder and encoder

Reply via email to