[
https://issues.apache.org/jira/browse/AVRO-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713096#comment-16713096
]
ASF GitHub Bot commented on AVRO-2281:
--------------------------------------
dkulp closed pull request #401: AVRO-2281: Optimize ruby binary encoder/decoder
URL: https://github.com/apache/avro/pull/401
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/lang/ruby/lib/avro/io.rb b/lang/ruby/lib/avro/io.rb
index 5961a4802..31107217a 100644
--- a/lang/ruby/lib/avro/io.rb
+++ b/lang/ruby/lib/avro/io.rb
@@ -43,7 +43,7 @@ def initialize(reader)
end
def byte!
- @reader.read(1).unpack('C').first
+ @reader.readbyte
end
def read_null
@@ -76,7 +76,7 @@ def read_float
# The float is converted into a 32-bit integer using a method
# equivalent to Java's floatToIntBits and then encoded in
# little-endian format.
- @reader.read(4).unpack('e')[0]
+ read_and_unpack(4, 'e'.freeze)
end
def read_double
@@ -84,7 +84,7 @@ def read_double
# The double is converted into a 64-bit integer using a method
# equivalent to Java's doubleToLongBits and then encoded in
# little-endian format.
- @reader.read(8).unpack('E')[0]
+ read_and_unpack(8, 'E'.freeze)
end
def read_bytes
@@ -97,7 +97,7 @@ def read_string
# A string is encoded as a long followed by that many bytes of
# UTF-8 encoded character data.
read_bytes.tap do |string|
- string.force_encoding("UTF-8") if string.respond_to? :force_encoding
+ string.force_encoding('UTF-8'.freeze) if string.respond_to?
:force_encoding
end
end
@@ -144,6 +144,23 @@ def skip_string
def skip(n)
reader.seek(reader.tell() + n)
end
+
+ private
+
+ # Optimize unpacking strings when `unpack1` is available (ruby >= 2.4)
+ if String.instance_methods.include?(:unpack1)
+
+ def read_and_unpack(byte_count, format)
+ @reader.read(byte_count).unpack1(format)
+ end
+
+ else
+
+ def read_and_unpack(byte_count, format)
+ @reader.read(byte_count).unpack(format)[0]
+ end
+
+ end
end
# Write leaf values
@@ -188,7 +205,7 @@ def write_long(n)
# equivalent to Java's floatToIntBits and then encoded in
# little-endian format.
def write_float(datum)
- @writer.write([datum].pack('e'))
+ @writer.write([datum].pack('e'.freeze))
end
# A double is written as 8 bytes.
@@ -196,7 +213,7 @@ def write_float(datum)
# equivalent to Java's doubleToLongBits and then encoded in
# little-endian format.
def write_double(datum)
- @writer.write([datum].pack('E'))
+ @writer.write([datum].pack('E'.freeze))
end
# Bytes are encoded as a long followed by that many bytes of data.
@@ -208,7 +225,7 @@ def write_bytes(datum)
# A string is encoded as a long followed by that many bytes of
# UTF-8 encoded character data
def write_string(datum)
- datum = datum.encode('utf-8') if datum.respond_to? :encode
+ datum = datum.encode('utf-8'.freeze) if datum.respond_to? :encode
write_bytes(datum)
end
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Performance improvement in ruby binary decoder and encoder
> ----------------------------------------------------------
>
> Key: AVRO-2281
> URL: https://issues.apache.org/jira/browse/AVRO-2281
> Project: Apache Avro
> Issue Type: Improvement
> Components: ruby
> Affects Versions: 1.9.0
> Reporter: Kyle Phelps
> Priority: Minor
>
> The ruby binary decoder has some inefficient memory usage patterns. The
> decoding process relies on `unpack` quite heavily, but it allocates an array
> that is not used as we just access the first element of the result. In ruby
> 2.4 we can use the optimized version of this, `unpack1` which avoids
> allocating the unused array. In `byte!`, we can go one step further and just
> use the `readbyte` method provided by the IO module - this improves the
> performance of `byte!` by about 50%. Additionally, there are a few strings in
> the encoder and decoder that should be frozen to reduce unnecessary string
> allocations.
>
> With these changes, I've seen about a 20% performance improvement when
> decoding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)