[ 
https://issues.apache.org/jira/browse/AVRO-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545431#comment-17545431
 ] 

Christophe Le Saec commented on AVRO-3527:
------------------------------------------

For [hashCode 
method|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1099-L1111],
 it should be possible, for Record and Arrays, to limit the number of 
comparison ({_}number of field in Record, element in Array and deep in case of 
a value is another record or array{_}), as hashCode doesn't need to always 
differentiate objects.
But i can't see what kind of improvement we can make on equals method (that 
call 
[compare|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1144]),
 as we have to compare all elements until we see a difference.
Any idea ?

> Generated equals() and hashCode() for SpecificRecords
> -----------------------------------------------------
>
>                 Key: AVRO-3527
>                 URL: https://issues.apache.org/jira/browse/AVRO-3527
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Steven Aerts
>            Priority: Major
>         Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, 
> flame_graph.jpeg
>
>
> When profiling our production system, we found that it was spending almost 
> 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and 
> {{SpecificRecordBase.equals()}} implementations.
> In some sections of its logic we see that almost all time is spend in those 
> function, as can be seen in attached flame graph  (blue "pyramids")
> !flame_graph.jpeg|width=385,height=99!
> By generating the {{.equals()}} and {{.hashCode()}} all this overhead 
> disappeared and this application became 35% faster overall. 
> Also on other AVRO heavy applications we saw noticeable performance gains 
> where we hadn't expect them due to this improvement.
> A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster 
> than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster.
> Which is also visible in the attached JMH benchmarks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to