-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/
-----------------------------------------------------------

(Updated Aug. 30, 2013, 6:49 p.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Updated with Jakob's comments


Bugs: HIVE-4732
    https://issues.apache.org/jira/browse/HIVE-4732


Repository: hive-git


Description
-------

>From our performance analysis, we found AvroSerde's schema.equals() call 
>consumed a substantial amount ( nearly 40%) of time. This patch intends to 
>minimize the number schema.equals() calls by pushing the check as late/fewer 
>as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.
 
   


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
ed2a9af 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
e994411 
  
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
 66f0348 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 
3828940 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
9af751b 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 

Diff: https://reviews.apache.org/r/12480/diff/


Testing
-------


Thanks,

Mohammad Islam

Reply via email to