[jira] [Work logged] (AVRO-3048) Using builders leads to performance degradation

ASF GitHub Bot (Jira) Wed, 12 Jan 2022 17:33:07 -0800


     [ 
https://issues.apache.org/jira/browse/AVRO-3048?focusedWorklogId=707984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707984
 ]


ASF GitHub Bot logged work on AVRO-3048:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jan/22 01:32
            Start Date: 13/Jan/22 01:32
    Worklog Time Spent: 10m 
      Work Description: radai-rosenblatt commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1011640589


   Hey folks.
   
   I'd like to elaborate a little here about why the avro-util project exists. 
we can then see if involving upstream avro makes sense in anything.
   
   TL;DR - when using avro in a large, diverse ecosystem, supporting multiple 
versions of avro simultaneously across the ecosystem is required. there a few 
things that would make this work easier listed near the bottom.
   
   the main reasons are code/schema reuse and portable libraries. Linkein uses 
avro for encoding across a lot of its "data plane" (kafka being a good 
example). a lot of these payloads are "shared models" - data produced by 
service X that is consumed by services Y and Z over a kafka topic, or any other 
storage medium or rpc. as such all parties involved must have "the same 
schema". this is especially true for avro - which requires the exact writer 
schema on-hand for decoding - vs other serialization formats. the best way to 
guarantee all parties have the same schema is to have the schema(s) in question 
in their own library, imported by all parties. 
   
   schemas are not the only thing shared here, code is as well: its much more 
convenient for developers to operate on generated POJO classes than it is to 
operate on generic records, and there are libraries who's top level APIs 
accept/return IndexedRecords, meaning they expose avro.
   
   given the sheer number of codebases involved, and that some constraints on 
avro are beyond the ability of an organization to control (dictated by 
requirements of 3rd party external libraries), its not feasible to align on a 
single version of avro across the organization, and individual codebases change 
their version of avro on their own schedules.
   
   and so, even maintaining a library of generated record classes, not to 
mention writing a "portable" library that accepts/returns IndexedRecords 
requires navigating a very wide range of runtime avro versions. we do so by 
combination of adapters (different implementations of "the same thing" for 
different avro versions) and tweaking the generated specific classes.
   
   its obviously unreasonable to expect the avro API to never change. however, 
there have been previous breaking changes that could have been done "nicer". 
some examples are below if youre curious.
   
   things the avro project could change to make maintaining avro-util easier:
   
   - document breaking changes to APIs and/or behaviors. currently we try 
"guessing" those by looking at commit logs but more often than not our users 
find them 1st 
   - consider adding options to better support cross-version compatibility. 
this is especially painful in generated code where something as simple as not 
adding an @Overrides annotation to new methods immediately saves a surprising 
amount of compilation issues. having default implementations for new methods in 
the parent abstract classes would also help a great deal.
   - "phase in" breaking changes with a period of error logs?
   
   examples of past breaking changes that could have been less breaking:
   
   - the json format for union encoding was changed between 1.4 and 1.5 to use 
a fullname as discriminator instead of simple name. the parser could have been 
made to accept both forms for cases where the simple names are distinct to make 
this a less breaking change (we eventually wrote our own version of JsonDecoder 
that does exactly that)
   - in more recent memory (https://issues.apache.org/jira/browse/AVRO-2035) 
validation for default values was tightened - which is great. but some things 
like a default value of "true" (a json string) for a binary property could have 
been kept around initially (with a very nagging error printed at parse time)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 707984)
    Time Spent: 2h 40m  (was: 2.5h)

> Using builders leads to performance degradation
> -----------------------------------------------
>
>                 Key: AVRO-3048
>                 URL: https://issues.apache.org/jira/browse/AVRO-3048
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2, 1.10.1
>            Reporter: Peter
>            Assignee: Martin Jubelgas
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When you do a .newBuilder() for avro generated classes, this will call
> org.apache.avro.specific.SpecificData.getForSchema:
>  
> public static SpecificData getForSchema(Schema reader) {
>     if (reader.getType() == Type.RECORD) {
>       final String className = getClassName(reader);
>       if (className != null) {
>         final Class<?> clazz;
>         try             
> {
>                              clazz = Class.forName(className);   
>                 return getForClass(clazz);             }
>           catch (ClassNotFoundException e)
> {           return SpecificData.get();
>          }
>       }
>     }
>  
> which seems then to seldom find the value inside the try and a lot of 
> ClassNotFoundException is thrown.
> Throwing internal exceptions has great performance penalties and in practice 
> users of avro 1.9.x. and 1.10.x in high performance applications are forced 
> not to use builders.
>  
> Information about same problem is also found on:
> [https://forums.databricks.com/questions/50803/orgapacheavrospecificspecificdatagetforschema-sear.html]
> Problem exists on at least 1.9.2 and 1.10.1 (but not on 1.7.x) in OSGI 
> environment



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (AVRO-3048) Using builders leads to performance degradation

Reply via email to