[ 
https://issues.apache.org/jira/browse/AVRO-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545745#comment-17545745
 ] 

Steven Aerts edited comment on AVRO-3527 at 6/3/22 5:53 AM:
------------------------------------------------------------

Hi [~clesaec],

thanks for responding so quickly.  I just uploaded my 
[PR#1708|https://github.com/apache/avro/pull/1708] with the proposed solution.

The performance gain lies in the fact that the jvm is tuned to optimize 
statements like the once this patch will generate:
{code:java}
hashCode = 31 * hashCode + Integer.hashCode(this.number);
// OR
if (this.number != other.number) return false;{code}
But the GenericData implementation of those functions takes can not be inlined 
and optimized that easily. 
As every time it will open the schema, loop over all it fields, retrieve the 
values of those fields in a generic way and compare or hash them recursively.  
This is very intensive for the JVM.
The JVM has for example no way of knowing that the fields of a schema will not 
change between different iteration of the same Class.  So there is not really a 
lot of room for it to optimize that logic, it just has to execute it over and 
over for every iteration.


was (Author: steven.aerts):
Hi [~clesaec],

thanks for responding so quickly.  I just uploaded my 
[PR#1708|https://github.com/apache/avro/pull/1708] with the proposed solution.

The performance gain lies in the fact that the jvm is tuned to optimize 
statements like the once this patch will generate:
{code:java}
hashCode = 31 * hashCode + Integer.hashCode(this.number);
// OR
if (this.number != other.number) return false;{code}
But the GenericData implementation of those functions takes can not be unlined 
and optimized that easily. 
As every time it will open the schema, loop over all it fields, retrieve the 
values of those fields in a generic way and compare or hash them recursively.  
This is very intensive for the JVM.
The JVM has for example no way of knowing that the fields of a schema will not 
change between different iteration of the same Class.  So there is not really a 
lot of room for it to optimize that logic, it just has to execute it over and 
over for every iteration.

> Generated equals() and hashCode() for SpecificRecords
> -----------------------------------------------------
>
>                 Key: AVRO-3527
>                 URL: https://issues.apache.org/jira/browse/AVRO-3527
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Steven Aerts
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, 
> flame_graph.jpeg
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When profiling our production system, we found that it was spending almost 
> 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and 
> {{SpecificRecordBase.equals()}} implementations.
> In some sections of its logic we see that almost all time is spend in those 
> function, as can be seen in attached flame graph  (blue "pyramids")
> !flame_graph.jpeg|width=385,height=99!
> By generating the {{.equals()}} and {{.hashCode()}} all this overhead 
> disappeared and this application became 35% faster overall. 
> Also on other AVRO heavy applications we saw noticeable performance gains 
> where we hadn't expect them due to this improvement.
> A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster 
> than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster.
> Which is also visible in the attached JMH benchmarks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to