-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31800/
-----------------------------------------------------------

(Updated March 9, 2015, 5:32 p.m.)


Review request for hive, Ryan Blue and cheng xu.


Changes
-------

New patch that makes changes due to feedback.


Bugs: HIVE-9658
    https://issues.apache.org/jira/browse/HIVE-9658


Repository: hive-git


Description
-------

This patch bypasses primitive java objects to hive object inspectors without 
using primitive Writable objects.
It helps to reduce memory usage.

I did not bypass other complex objects, such as binaries, decimal and 
date/timestamp, because their Writable objects are needed in other parts of the 
code,
and creating them later takes more ops/s to do it. Better save time at the 
beginning.


Diffs (updated)
-----

  
itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
 4f6985cd13017ce37f4f0c100b16a27aa5b02f8b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 c915f728fc9b27da0fabefab5d8f5faa53640b78 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
0391229723cc3ecef551fa44b8456b0d2ac93fb5 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java
 d7edd52614771857d1b21971a66894841c248ef9 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java 
6ff6b473c9f1867bc14bb597094ddb92487cc954 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
 a43661eb54ba29692c07c264584b5aecf648ef99 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
3fc012970e23bbc188ce2a2e2ba0b04bc6f22317 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java
 f1c8b6f13718b37f590263e5b35ed6c327f5cf4f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java
 c6d03a19029d5bcc86b998dd7a8609973648c103 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java
 f95d15eddc21bc432fa53572de5756751a13341a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java 
ee57b31dac53d99af0c5a520f51102796ca32fd3 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 57ae7a9740d55b407cadfc8bc030593b29f90700 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 f69d13cdf6801f6dcc247100eaa71f84d45b57a0 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
 49bf1c5325833993f4c09efdf1546af560783c28 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 609188206f88e296d893b84bcaaab53f974e6b7d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java
 143d72e76502d4877e8208181d9743259051dcea 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java
 bde0dcbb3978ba47b15ae2c9bbe2f87ed3984ab1 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
7fd5e9612d4e3c9bf3b816bc48dbdbe59fb8a5a8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java
 22250b30a14d52907fb22d4f44b93c7633c6a89e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java
 864f56292fa4856df155f546064e4a6732cc663f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java
 39f265777c7e164382117e3902c3b6e491295f70 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 
3a476731e31bf38822f0d530f0aea2eadb675a49 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestArrayCompatibility.java 
d45d8eeb9e8a61f254098ab15d0305fc71152abd 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
8f03c5b403332f7b36b2271a2246a0fc90b3bfba 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapStructures.java 
3c7401ffbe88ce66b96f9cceab4e9c3d6267f8fe 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java
 1a54bf5797efd5859c9e665bcc7134168e5d193f 
  serde/src/java/org/apache/hadoop/hive/serde2/io/ObjectArrayWritable.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31800/diff/


Testing
-------

Some performance tests were done to validate this.

Schema: 
int,double,boolean,string,array<int>,map<string,string>,struct<a:int,b:int>
  
- JMH (Microbenchmarks) calls on parquet reads.
  
  Before: 579 ops/s
  After:  651 ops/s

- YourKit Java Profiler to measure memory objects recorded.
  Reading 20,000 random rows (10 times)
  
  Before:
     Objects recorded:   1,863,610
     Objects size:       42,373,808
     Total memory usage: 29%
     
  After:
     Objects recorded:   1,596,804
     Objects size:       34,192,832
     Total memory usage: 24%

All tests were run multiple times to get same results.


Thanks,

Sergio Pena

Reply via email to