[ 
https://issues.apache.org/jira/browse/AVRO-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503083#comment-17503083
 ] 

Ivan Zemlyanskiy commented on AVRO-3408:
----------------------------------------

We found even more danger case for this type of migration or just accidental 
type mismatch:
the exception as I described above is the best luck in this scenario. Imagine 
this:
A writer has a schema
{code:json}
{
  "type" : "string",
  "java-class" : "java.math.BigDecimal"
}
{code}
and a reader has a schema:
{code:json}
{
  "type" : "bytes",
  "logicalType" : "decimal",
  "precision" : 16,
  "scale" : 5
}
{code}
The current behavior is 
*ResolvingDecoder* at the method *readBytes* calls *advance* on a parser and 
because of the type mismatch the parser calls decoder's *doAction* where we try 
to decide what's the result type should be. But I'm kind of baffled by [this 
logic|https://github.com/apache/avro/blob/master/lang%2Fjava%2Favro%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Favro%2Fio%2FResolvingDecoder.java#L294-L300].
 Avro just says "okay, whatever, just use writer schema" regardless what were 
the types and what's the kind of the original type mismatch. 
So  at the end of the day we just [treat the incoming string as a byte 
array|https://github.com/apache/avro/blob/master/lang%2Fjava%2Favro%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Favro%2Fio%2FResolvingDecoder.java#L236-L240]
 and make a BigDecimal from it... 
And I can tell you the truth, it's a time bomb which exploded our production 
once. Because from  
{code:java}
new BigDecimal("123.321");
{code}
after decoding we got 
{code:java}
new BigDecimal("138474692586.50161");
{code}
without any warnings,exceptions, etc. 
That was a remarkable day, I dare say =) 

No blame for Avro, of course that was our mistake and we paid our price for 
that. But I thought I got to do my best to stop anything like that for anyone 
else. 

Thanks for reading and I'd love to read any thoughts from the community.  

> Schema evolution with logical types 
> ------------------------------------
>
>                 Key: AVRO-3408
>                 URL: https://issues.apache.org/jira/browse/AVRO-3408
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.0
>            Reporter: Ivan Zemlyanskiy
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello!
> First of all, thank you for this project. I love Avro encoding from both 
> technology and code culture points of view. (y)
> I know you recommend migrating schema by adding a new field and removing the 
> old one in the future, but please-please-please consider my case as well. 
> In my company, we have some DTOs, and it's about 200+ fields in total that we 
> encode with Avro and send over the network. About a third of them have type 
> `java.math.BigDecimal`. At some point, we discovered we send them with a 
> schema like
> {code:json}
> {
>   "name":"performancePrice",
>   "type":{
>     "type":"string",
>     "java-class":"java.math.BigDecimal"
>   }
> }
> {code}
> That's a kind of disaster for us cos we have pretty much a high load with ~2 
> million RPS. 
> So we start to think about migrating to something lighter than strings (no 
> blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, 
> and string is the easiest way for encoding/decoding).
> It was fine to make a standard precision for all such fields, so we found 
> `Conversions.DecimalConversion` and decided at the end of the day we were 
> going to use this logical type with a recommended schema like
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.BYTES);
>         LogicalTypes.Decimal decimalType =
>                 LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), 
> DecimalUtils.MONEY_ROUNDING_SCALE);
>         decimalType.addToSchema(schema);
>         return schema;
>     }
> {code}
> (we use `org.apache.avro.reflect.ReflectData`)
> It all looks good and promising, but the question is how to migrate to such 
> schema? 
> As I said, we have a lot of such fields, and migrating all of them with 
> duplication fields with future removal might be painful and would cost us a 
> considerable overhead.
> I made some tests and found out if two applications register the same 
> `BigDecimalConversion` but for one application the `getRecommendedSchema()` 
> is like the method above and for another application the 
> `getRecommendedSchema()` is
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.STRING);
>         schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName());
>         return schema;
>     }
> {code}
> so they can easily read each other messages using _SERVER_ schema.
> So, I made two applications and wired them up with `ProtocolRepository`, 
> `ReflectResponder` and all that stuff, I found out it doesn't work. Because 
> `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some 
> reason. 
> So as a result, one application specifically told "I encode this field as a 
> byte array which supposed to be a logical type 'decimal' with precision N", 
> but another application just tries to convert those bytes to a string and 
> make a BigDecimal based on the result string. As a result, we got
> {code:java}
> java.lang.NumberFormatException: Character ' is neither a decimal digit 
> number, decimal point, nor "e" notation exponential mark.
> {code}
> In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect 
> logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding 
> conversion instance for reading values. In my example, I'd say it might be 
> {code}
> ResolvingDecoder#readString() -> read the actual logical type -> find 
> BigDecimalConversion instance -> 
> conversion.fromBytes(readValueWithActualSchema()) -> 
> conversion.toCharSequence(readValueWithConversion)
> {code}
> I'd love to read your opinion on all of that. 
> Thank you in advance for your time, and sorry for the long issue description. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to