[
https://issues.apache.org/jira/browse/AVRO-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508373#comment-17508373
]
Ryan Skraba commented on AVRO-3408:
-----------------------------------
Thanks for the interesting and well described use case! I turned on GitHub
actions for your PR, and my apologies for taking so long to getting to it!
Before Avro 1.8, the only technique to have an in-memory representation of
BigDecimal was to use a
[Stringable|https://avro.apache.org/docs/1.11.0/api/java/org/apache/avro/reflect/Stringable.html]
(as you've originally used it). As you've discovered, it isn't a great
representation, and logical types are much more concise. There's some special
handling in Java (and the evidence is the "isStringable" methods you find here
and there).
The *{{decimal}}* logical type was designed to only applies to the two types of
binary data (bytes and fixed). Unfortunately your PR extends this to bytes,
fixed and string... I really don't have any strong objection of course, but
this should probably be discussed on the mailing list and changed in the
[specification|https://avro.apache.org/docs/current/spec.html#Decimal] (which
is probably something that shouldn't be included in the next minor release).
To be honest, the best approach (in Java) might be to just hard-code the
conversion of Stringable BigDecimal to logicalType BigDecimal when it's
detected!
> Schema evolution with logical types
> ------------------------------------
>
> Key: AVRO-3408
> URL: https://issues.apache.org/jira/browse/AVRO-3408
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.11.0
> Reporter: Ivan Zemlyanskiy
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> Hello!
> First of all, thank you for this project. I love Avro encoding from both
> technology and code culture points of view. (y)
> I know you recommend migrating schema by adding a new field and removing the
> old one in the future, but please-please-please consider my case as well.
> In my company, we have some DTOs, and it's about 200+ fields in total that we
> encode with Avro and send over the network. About a third of them have type
> `java.math.BigDecimal`. At some point, we discovered we send them with a
> schema like
> {code:json}
> {
> "name":"performancePrice",
> "type":{
> "type":"string",
> "java-class":"java.math.BigDecimal"
> }
> }
> {code}
> That's a kind of disaster for us cos we have pretty much a high load with ~2
> million RPS.
> So we start to think about migrating to something lighter than strings (no
> blame for choosing it as a default, I know BigDecimal has a lot of pitfalls,
> and string is the easiest way for encoding/decoding).
> It was fine to make a standard precision for all such fields, so we found
> `Conversions.DecimalConversion` and decided at the end of the day we were
> going to use this logical type with a recommended schema like
> {code:java}
> @Override
> public Schema getRecommendedSchema() {
> Schema schema = Schema.create(Schema.Type.BYTES);
> LogicalTypes.Decimal decimalType =
> LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(),
> DecimalUtils.MONEY_ROUNDING_SCALE);
> decimalType.addToSchema(schema);
> return schema;
> }
> {code}
> (we use `org.apache.avro.reflect.ReflectData`)
> It all looks good and promising, but the question is how to migrate to such
> schema?
> As I said, we have a lot of such fields, and migrating all of them with
> duplication fields with future removal might be painful and would cost us a
> considerable overhead.
> I made some tests and found out if two applications register the same
> `BigDecimalConversion` but for one application the `getRecommendedSchema()`
> is like the method above and for another application the
> `getRecommendedSchema()` is
> {code:java}
> @Override
> public Schema getRecommendedSchema() {
> Schema schema = Schema.create(Schema.Type.STRING);
> schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName());
> return schema;
> }
> {code}
> so they can easily read each other messages using _SERVER_ schema.
> So, I made two applications and wired them up with `ProtocolRepository`,
> `ReflectResponder` and all that stuff, I found out it doesn't work. Because
> `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some
> reason.
> So as a result, one application specifically told "I encode this field as a
> byte array which supposed to be a logical type 'decimal' with precision N",
> but another application just tries to convert those bytes to a string and
> make a BigDecimal based on the result string. As a result, we got
> {code:java}
> java.lang.NumberFormatException: Character ' is neither a decimal digit
> number, decimal point, nor "e" notation exponential mark.
> {code}
> In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect
> logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding
> conversion instance for reading values. In my example, I'd say it might be
> {code}
> ResolvingDecoder#readString() -> read the actual logical type -> find
> BigDecimalConversion instance ->
> conversion.fromBytes(readValueWithActualSchema()) ->
> conversion.toCharSequence(readValueWithConversion)
> {code}
> I'd love to read your opinion on all of that.
> Thank you in advance for your time, and sorry for the long issue description.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)