[jira] [Work logged] (AVRO-3408) Schema evolution with logical types

ASF GitHub Bot (Jira) Mon, 21 Mar 2022 06:22:22 -0700


     [ 
https://issues.apache.org/jira/browse/AVRO-3408?focusedWorklogId=745045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-745045
 ]


ASF GitHub Bot logged work on AVRO-3408:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Mar/22 13:21
            Start Date: 21/Mar/22 13:21
    Worklog Time Spent: 10m 
      Work Description: izemlyanskiy commented on pull request #1584:
URL: https://github.com/apache/avro/pull/1584#issuecomment-1073890523


   I beg your very pardon for my ignorance, but @RyanSkraba  and @rstata are 2 
different people, right? 
   @RyanSkraba, thank you for your self-request, I look forward to your opinion 
on this PR :pray: 
   @rstata you were the last person who touched 
`org.apache.avro.io.parsing.ResolvingGrammarGenerator`. I've got a question for 
you, in my PR we create a new instance of `GenericDatumReader` on every 
conversion, it works fine but it's might be inefficient.
   I thought to create such a reader in `ResolvingAction` or even create a new 
`Action` and delegate all that conversation business to the action. But for 
that, we need a reference of `org.apache.avro.generic.GenericData`  here 
`org.apache.avro.io.ResolvingDecoder#resolve`. I made an attempt and it could 
be done with no harm to other code, but I didn't dare to offer such code 
without a discussion. 
   Long story short, my suggestion is to add a `GenericData` parameter to 
   `org.apache.avro.io.DecoderFactory#resolvingDecoder`
   and sink down it to `org.apache.avro.io.ResolvingDecoder` constructor, 
method`org.apache.avro.io.ResolvingDecoder#resolve` and at the end make a field 
in  `org.apache.avro.io.parsing.ResolvingGrammarGenerator` in order to use it 
in 
   
`org.apache.avro.io.parsing.ResolvingGrammarGenerator#generate(org.apache.avro.Resolver.Action,
 java.util.Map<java.lang.Object,org.apache.avro.io.parsing.Symbol>)` at this 
moment: 
   ```java
       if (action instanceof Resolver.Promote) {
         return Symbol.resolve(action.writer, action.reader, 
simpleGen(action.writer, seen),
             simpleGen(action.reader, seen));
   ```
   (presumably in `Symbol.resolve` method)
   
   Thank you for your time. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 745045)
    Time Spent: 3h 20m  (was: 3h 10m)

> Schema evolution with logical types 
> ------------------------------------
>
>                 Key: AVRO-3408
>                 URL: https://issues.apache.org/jira/browse/AVRO-3408
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.0
>            Reporter: Ivan Zemlyanskiy
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Hello!
> First of all, thank you for this project. I love Avro encoding from both 
> technology and code culture points of view. (y)
> I know you recommend migrating schema by adding a new field and removing the 
> old one in the future, but please-please-please consider my case as well. 
> In my company, we have some DTOs, and it's about 200+ fields in total that we 
> encode with Avro and send over the network. About a third of them have type 
> `java.math.BigDecimal`. At some point, we discovered we send them with a 
> schema like
> {code:json}
> {
>   "name":"performancePrice",
>   "type":{
>     "type":"string",
>     "java-class":"java.math.BigDecimal"
>   }
> }
> {code}
> That's a kind of disaster for us cos we have pretty much a high load with ~2 
> million RPS. 
> So we start to think about migrating to something lighter than strings (no 
> blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, 
> and string is the easiest way for encoding/decoding).
> It was fine to make a standard precision for all such fields, so we found 
> `Conversions.DecimalConversion` and decided at the end of the day we were 
> going to use this logical type with a recommended schema like
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.BYTES);
>         LogicalTypes.Decimal decimalType =
>                 LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), 
> DecimalUtils.MONEY_ROUNDING_SCALE);
>         decimalType.addToSchema(schema);
>         return schema;
>     }
> {code}
> (we use `org.apache.avro.reflect.ReflectData`)
> It all looks good and promising, but the question is how to migrate to such 
> schema? 
> As I said, we have a lot of such fields, and migrating all of them with 
> duplication fields with future removal might be painful and would cost us a 
> considerable overhead.
> I made some tests and found out if two applications register the same 
> `BigDecimalConversion` but for one application the `getRecommendedSchema()` 
> is like the method above and for another application the 
> `getRecommendedSchema()` is
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.STRING);
>         schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName());
>         return schema;
>     }
> {code}
> so they can easily read each other messages using _SERVER_ schema.
> So, I made two applications and wired them up with `ProtocolRepository`, 
> `ReflectResponder` and all that stuff, I found out it doesn't work. Because 
> `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some 
> reason. 
> So as a result, one application specifically told "I encode this field as a 
> byte array which supposed to be a logical type 'decimal' with precision N", 
> but another application just tries to convert those bytes to a string and 
> make a BigDecimal based on the result string. As a result, we got
> {code:java}
> java.lang.NumberFormatException: Character ' is neither a decimal digit 
> number, decimal point, nor "e" notation exponential mark.
> {code}
> In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect 
> logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding 
> conversion instance for reading values. In my example, I'd say it might be 
> {code}
> ResolvingDecoder#readString() -> read the actual logical type -> find 
> BigDecimalConversion instance -> 
> conversion.fromBytes(readValueWithActualSchema()) -> 
> conversion.toCharSequence(readValueWithConversion)
> {code}
> I'd love to read your opinion on all of that. 
> Thank you in advance for your time, and sorry for the long issue description. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (AVRO-3408) Schema evolution with logical types

Reply via email to