ziggythehamster opened a new pull request #918: URL: https://github.com/apache/avro/pull/918
### Jira - [x] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR" - https://issues.apache.org/jira/browse/AVRO-2677 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). - `memory_profiler` is MIT licensed and is only required for testing ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - Unit tests added for decimal logical type encoding/decoding according to the Avro specification - Unit tests added to ensure performance regressions are not unknowingly introduced with the encoder/decoder, as we have made an effort to make this the most performant encoder/decoder possible ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - README/CHANGELOG not updated, but the logical type support from AVRO-2677 should have included this encoder/decoder rather than leaving it up to consumers to shove the correct bytes in there. AVRO-2677 is closed, and we are unsure the appropriate measure to take with that. Should we have opened a new issue due to AVRO-2677 not being fully implemented or reopened it? If the former, if someone wants to create that ticket, we're happy to rewrite our commit messages with the correct ticket number. ### Supercedes PRs The following PRs are currently open and implement an incorrect version of this feature, and should be closed: * https://github.com/apache/avro/pull/829 * https://github.com/apache/avro/pull/840 Both PRs shove a string like `"1.234"` into the bytes, rather than encoding them according to the specification. Both PRs do not validate inputs nor introduce infrastructure to do that. ### Notes * The Avro specification is imprecise about how decimals are to be implemented, which required us to dig into the source code of Avro for Java as well as dig into Java's BigDecimal and BigInteger to make sure we were doing the same thing. Perhaps the specification could include a Java one-liner that implements the encoder/decoder? Here's a Scala one-liner that we used to test our implementation: ```scala val encoded = new java.math.BigDecimal("3.4562").setScale(6).unscaledValue().toByteArray() val decoded = new java.math.BigDecimal(new java.math.BigInteger(encoded), 6) encoded.map("%02x".format(_)).mkString(" ") // 34 bc c8: String decoded // 3.456200: java.math.BigDecimal ``` * We tested this in Ruby 2.3, 2.4, 2.5, 2.6, and 2.7. This is the reason for the <= check for retained objects, as some Ruby versions retain objects where others don't. We think this is either a bug in `memory_profiler` or a bug in Ruby itself. * Your build system depends on the `echoe` gem, but `echoe` is not compatible with RubyGems > 2.7. RubyGems 3.x has been out since 2018, and RubyGems 2.7 barely works in newer versions of Ruby. Consider upgrading this. * This PR is against master, but 1.9 is the current stable version. [This branch](https://github.com/art19/avro/tree/art19-patched-1.9-with-pr-761) is a version based on 1.9 with #761 incorporated (#761 was the PR that incompletely implemented AVRO-2677). * There's [a gem published to GitHub packages](https://github.com/art19/avro/packages/262653?version=1.9.3.pre.b88b65e2) as well, if you're like us and need a version with decimal support before this hits an official channel. ### Thanks I'm filing the upstream PR here, but @johvet did almost all of the work, performance tuning, and testing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org