[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706360#comment-13706360
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
---------------------------------------------

New patch is uploaded in RB: https://reviews.apache.org/r/12480/

Description copied from RB:
>From our performance analysis, we found AvroSerde's schema.equals() call 
>consumed a substantial amount ( nearly 40%) of time. This patch intends to 
>minimize the number schema.equals() calls by pushing the check as late/fewer 
>as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.

   
                
> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> ---------------------------------------------------------------------
>
>                 Key: HIVE-4732
>                 URL: https://issues.apache.org/jira/browse/HIVE-4732
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Mark Wagner
>            Assignee: Mohammad Kamrul Islam
>         Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. 
> Changing to compare hashcodes (which can be computed once then reused) will 
> improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to