[GitHub] [avro] tjwp commented on a change in pull request #918: AVRO-2677: Fully and correctly support decimal logical type in Ruby

GitBox Tue, 13 Oct 2020 04:12:57 -0700


tjwp commented on a change in pull request #918:
URL: https://github.com/apache/avro/pull/918#discussion_r503865563




##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -16,9 +16,209 @@
 # limitations under the License.
 
 require 'date'
+require 'bigdecimal'
+require 'bigdecimal/util'
 
 module Avro
   module LogicalTypes
+
+    ##
+    # Base class for logical types requiring a schema to be present
+    class LogicalTypeWithSchema
+      ##
+      # @return [Avro::Schema] The schema this logical type is dealing with
+      attr_reader :schema
+
+      ##
+      # Build a new isntance of a logical type using the provided schema
+      #
+      # @param schema [Avro::Schema]
+      #     The schema to use with this instance
+      #
+      # @raise [ArgumentError]
+      #     If the provided schema is nil
+      def initialize(schema)
+        raise ArgumentError, 'schema is required' if schema.nil?
+
+        @schema = schema
+      end
+
+      ##
+      # Encode the provided datum
+      #
+      # @param datum [Object] The datum to encode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def encode(datum)
+        raise NotImplementedError
+      end
+
+      ##
+      # Decode the provided datum
+      #
+      # @param datum [Object] The datum to decode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def decode(datum)
+        raise NotImplementedError
+      end
+    end
+
+    ##
+    # Logical type to handle arbitrary-precision decimals using byte array.
+    #
+    # The byte array contains the two's-complement representation of the 
unscaled integer
+    # value in big-endian byte order.
+    class BytesDecimal < LogicalTypeWithSchema
+      # Messages for exceptions
+      ERROR_INSUFICIENT_PRECISION = 'Precision is too small'.freeze
+      ERROR_INVALID_SCALE         = 'Scale must be greater than or equal to 
0'.freeze
+      ERROR_INVALID_PRECISION     = 'Precision must be positive'.freeze
+      ERROR_PRECISION_TOO_SMALL   = 'Precision must be greater than 
scale'.freeze
+      ERROR_ROUNDING_NECESSARY    = 'Rounding necessary'.freeze
+      ERROR_VALUE_MUST_BE_NUMERIC = 'value must be numeric'.freeze
+
+      # The pattern used to pack up the byte array (8 bit unsigned 
integer/char)
+      PACK_UNSIGNED_CHARS = 'C*'.freeze
+
+      # The number 10 as BigDecimal
+      TEN = BigDecimal(10).freeze
+
+      ##
+      # @return [Integer] The number of total digits supported by the decimal
+      attr_reader :precision
+
+      ##
+      # @return [Integer] The number of fractional digits
+      attr_reader :scale
+
+      ##
+      # Build a new decimal logical type
+      #
+      # @param schema [Avro::Schema]
+      #     The schema defining precision and scale for the conversion
+      #
+      # @raise [ArgumentError]
+      #     If precision or scale have invalid values. Precision must be
+      #     positive and greater or equal to scale. If scale is missing,
+      #     it defaults to 0.
+      def initialize(schema)
+        super
+
+        @scale     = schema.scale.to_i
+        @precision = schema.precision.to_i
+        @factor    = TEN ** @scale
+
+        validate!
+      end
+
+      ##
+      # Encode the provided value into a byte array
+      #
+      # @param value [BigDecimal, Float, Integer]
+      #     The numeric value to encode
+      #
+      # @raise [ArgumentError]
+      #     If the provided value is not a numeric type
+      #
+      # @raise [RangeError]
+      #     If the provided value has a scale higher than the schema permits,
+      #     or does not fit into the schema's precision
+      def encode(value)
+        raise ArgumentError, ERROR_VALUE_MUST_BE_NUMERIC unless 
value.is_a?(Numeric)
+
+        
to_byte_array(unscaled_value(value.to_d)).pack(PACK_UNSIGNED_CHARS).freeze
+      end
+
+      ##
+      # Decode a byte array (in form of a string) into a BigDecimal of the
+      # given precision and scale
+      #
+      # @param stream [String]
+      #     The byte array to decode
+      #
+      # @return [BigDecimal]
+      def decode(stream)
+        from_byte_array(stream) / @factor
+      end
+
+      private
+
+      ##
+      # Convert the provided stream of bytes into the unscaled value
+      #
+      # @param stream [String]
+      #     The stream of bytes to convert
+      #
+      # @return [Integer]
+      def from_byte_array(stream)
+        bytes    = stream.bytes
+        positive = bytes.first[7].zero?
+        total    = 0
+
+        bytes.each_with_index do |value, ix|
+          total += (positive ? value : (value ^ 0xff)) << (bytes.length - ix - 
1) * 8
+        end
+
+        return total if positive
+
+        -(total + 1)
+      end
+
+      ##
+      # Convert the provided number into its two's complement representation
+      # in network order (big endian).
+      #
+      # @param number [Integer]
+      #     The number to convert
+      #
+      # @return [Array<Integer>]
+      #     The byte array in network order
+      def to_byte_array(number)
+        [].tap do |result|
+          loop do
+            result.unshift(number & 0xff)
+            number >>= 8
+
+            break if (number == 0 || number == -1) && (result.first[7] == 
number[7])
+          end
+        end
+      end
+
+      ##
+      # Get the unscaled value from a BigDecimal considering the schema's scale
+      #
+      # @param decimal [BigDecimal]
+      #     The decimal to get the unscaled value from
+      #
+      # @return [Integer]
+      def unscaled_value(decimal)
+        details = decimal.split
+        length  = details[1].length
+
+        fractional_part = length - details[3]
+        raise RangeError, ERROR_ROUNDING_NECESSARY if fractional_part > scale
+
+        if length > precision || (length - fractional_part) > (precision - 
scale)
+          raise RangeError, ERROR_INSUFICIENT_PRECISION
+        end
+
+        (decimal * @factor).to_i
+      end
+
+      ##
+      # Make sure the schema definition is legit
+      #
+      # @raise [ArgumentError] If scale or precision are not quite right
+      def validate!
+        raise ArgumentError, ERROR_INVALID_SCALE if scale.negative?
+        raise ArgumentError, ERROR_INVALID_PRECISION unless precision.positive?
+        raise ArgumentError, ERROR_PRECISION_TOO_SMALL if precision < scale
+      end

Review comment:
       I do not think this optimization is premature, and should be addressed 
before this is merged. There have been cases in the past where a single 
additional hash allocation for each datum encoded has had a material impact on 
performance.
   
   Addressing this would fit with validating the decimal logical type when the 
schema is parsed and detecting errors early.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [avro] tjwp commented on a change in pull request #918: AVRO-2677: Fully and correctly support decimal logical type in Ruby

Reply via email to