[
https://issues.apache.org/jira/browse/AVRO-3048?focusedWorklogId=707984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707984
]
ASF GitHub Bot logged work on AVRO-3048:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jan/22 01:32
Start Date: 13/Jan/22 01:32
Worklog Time Spent: 10m
Work Description: radai-rosenblatt commented on pull request #1333:
URL: https://github.com/apache/avro/pull/1333#issuecomment-1011640589
Hey folks.
I'd like to elaborate a little here about why the avro-util project exists.
we can then see if involving upstream avro makes sense in anything.
TL;DR - when using avro in a large, diverse ecosystem, supporting multiple
versions of avro simultaneously across the ecosystem is required. there a few
things that would make this work easier listed near the bottom.
the main reasons are code/schema reuse and portable libraries. Linkein uses
avro for encoding across a lot of its "data plane" (kafka being a good
example). a lot of these payloads are "shared models" - data produced by
service X that is consumed by services Y and Z over a kafka topic, or any other
storage medium or rpc. as such all parties involved must have "the same
schema". this is especially true for avro - which requires the exact writer
schema on-hand for decoding - vs other serialization formats. the best way to
guarantee all parties have the same schema is to have the schema(s) in question
in their own library, imported by all parties.
schemas are not the only thing shared here, code is as well: its much more
convenient for developers to operate on generated POJO classes than it is to
operate on generic records, and there are libraries who's top level APIs
accept/return IndexedRecords, meaning they expose avro.
given the sheer number of codebases involved, and that some constraints on
avro are beyond the ability of an organization to control (dictated by
requirements of 3rd party external libraries), its not feasible to align on a
single version of avro across the organization, and individual codebases change
their version of avro on their own schedules.
and so, even maintaining a library of generated record classes, not to
mention writing a "portable" library that accepts/returns IndexedRecords
requires navigating a very wide range of runtime avro versions. we do so by
combination of adapters (different implementations of "the same thing" for
different avro versions) and tweaking the generated specific classes.
its obviously unreasonable to expect the avro API to never change. however,
there have been previous breaking changes that could have been done "nicer".
some examples are below if youre curious.
things the avro project could change to make maintaining avro-util easier:
- document breaking changes to APIs and/or behaviors. currently we try
"guessing" those by looking at commit logs but more often than not our users
find them 1st
- consider adding options to better support cross-version compatibility.
this is especially painful in generated code where something as simple as not
adding an @Overrides annotation to new methods immediately saves a surprising
amount of compilation issues. having default implementations for new methods in
the parent abstract classes would also help a great deal.
- "phase in" breaking changes with a period of error logs?
examples of past breaking changes that could have been less breaking:
- the json format for union encoding was changed between 1.4 and 1.5 to use
a fullname as discriminator instead of simple name. the parser could have been
made to accept both forms for cases where the simple names are distinct to make
this a less breaking change (we eventually wrote our own version of JsonDecoder
that does exactly that)
- in more recent memory (https://issues.apache.org/jira/browse/AVRO-2035)
validation for default values was tightened - which is great. but some things
like a default value of "true" (a json string) for a binary property could have
been kept around initially (with a very nagging error printed at parse time)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 707984)
Time Spent: 2h 40m (was: 2.5h)
> Using builders leads to performance degradation
> -----------------------------------------------
>
> Key: AVRO-3048
> URL: https://issues.apache.org/jira/browse/AVRO-3048
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.2, 1.10.1
> Reporter: Peter
> Assignee: Martin Jubelgas
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.11.0
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> When you do a .newBuilder() for avro generated classes, this will call
> org.apache.avro.specific.SpecificData.getForSchema:
>
> public static SpecificData getForSchema(Schema reader) {
> if (reader.getType() == Type.RECORD) {
> final String className = getClassName(reader);
> if (className != null) {
> final Class<?> clazz;
> try
> {
> clazz = Class.forName(className);
> return getForClass(clazz); }
> catch (ClassNotFoundException e)
> { return SpecificData.get();
> }
> }
> }
>
> which seems then to seldom find the value inside the try and a lot of
> ClassNotFoundException is thrown.
> Throwing internal exceptions has great performance penalties and in practice
> users of avro 1.9.x. and 1.10.x in high performance applications are forced
> not to use builders.
>
> Information about same problem is also found on:
> [https://forums.databricks.com/questions/50803/orgapacheavrospecificspecificdatagetforschema-sear.html]
> Problem exists on at least 1.9.2 and 1.10.1 (but not on 1.7.x) in OSGI
> environment
--
This message was sent by Atlassian Jira
(v8.20.1#820001)