-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57353/
-----------------------------------------------------------
Review request for hive, Chaozhong Yang, Alan Gates, Rui Li, Prasanth_J, Sergio
Pena, Sahil Takiar, Vihang Karajgaonkar, and Xuefu Zhang.
Bugs: HIVE-16079
https://issues.apache.org/jira/browse/HIVE-16079
Repository: hive-git
Description
-------
When multiple concurrent Hive queries run, a separate copy of
org.apache.hadoop.hive.ql.metadata.Partition and
ql.plan.PartitionDesc is created for each table partition
per each query instance. So when in my benchmark explained in
HIVE-16079 we have 2000 partitions and 50 concurrent queries running
over them, we end up, in the worst case, with 2000*50=100,000 instances
of Partition and PartitionDesc in memory. These objects themselves
collectively take just ~2% of memory. However, other data structures
that each of them reference, take a lot more. In particular, Properties
objects take more than 20% of memory. When we have 50 concurrent
read-only queries, there are 50 identical copies of Properties per
each partition. That's a huge waste of memory.
This change introduces a new class that extends Properties, called
CopyOnFirstWriteProperties. It utilizes a unique interned copy of
Properties whenever possible. However, when one of the methods that
modify properties is called, a copy is created. When this class is
used, memory consumption by Properties falls from 20% to 5..6%.
Diffs
-----
common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
247d5890ea8131404b9543d22876ca4c052578e0
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
d05c1c68fdb7296c0346d73967071da1ebe7bb72
Diff: https://reviews.apache.org/r/57353/diff/1/
Testing
-------
Thanks,
Misha Dmitriev