[
https://issues.apache.org/jira/browse/AVRO-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337394#comment-17337394
]
Martin Jubelgas commented on AVRO-3048:
---------------------------------------
We've also recently encountered the massive classloader overhead that currently
comes with using Builder.newInstance(schema), which makes the builder
unattractive to use in high-performance scenarios even when there are no
classloading exceptions involved.
What beats me is why the Builder constructor needs to create his own
SpecificData instance in the first place, when the surrounding specific record
contains a suitable one in its MODEL$ field, which I think should be reasonable
and safe to reuse.
Providing a fix for newly generated code should hence be rather easy and
straightforward. Fixing the regression for existing code will be a lot more
intrusive, for it would likely need to implement some kind of cache, which I am
not quite sure how do do without incurring some potential synchronization
overhead (maybe a first step would be to implement that cache optional, so it
doesn't break or worsen things).
I'll try to submit a PR to address the issue at least for the newly generated
specific records, while taking some more time to maybe improve performance for
existing specific records that cannot just be newly generated.
> Using builders leads to performance degradation
> -----------------------------------------------
>
> Key: AVRO-3048
> URL: https://issues.apache.org/jira/browse/AVRO-3048
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.2, 1.10.1
> Reporter: Peter
> Priority: Major
>
> When you do a .newBuilder() for avro generated classes, this will call
> org.apache.avro.specific.SpecificData.getForSchema:
>
> public static SpecificData getForSchema(Schema reader) {
> if (reader.getType() == Type.RECORD) {
> final String className = getClassName(reader);
> if (className != null) {
> final Class<?> clazz;
> try
> {
> clazz = Class.forName(className);
> return getForClass(clazz); }
> catch (ClassNotFoundException e)
> { return SpecificData.get();
> }
> }
> }
>
> which seems then to seldom find the value inside the try and a lot of
> ClassNotFoundException is thrown.
> Throwing internal exceptions has great performance penalties and in practice
> users of avro 1.9.x. and 1.10.x in high performance applications are forced
> not to use builders.
>
> Information about same problem is also found on:
> [https://forums.databricks.com/questions/50803/orgapacheavrospecificspecificdatagetforschema-sear.html]
> Problem exists on at least 1.9.2 and 1.10.1 (but not on 1.7.x) in OSGI
> environment
--
This message was sent by Atlassian Jira
(v8.3.4#803005)