Misha Dmitriev created HIVE-15882:
-------------------------------------
Summary: HS2 generating high memory pressure with many partitions
and concurrent queries
Key: HIVE-15882
URL: https://issues.apache.org/jira/browse/HIVE-15882
Project: Hive
Issue Type: Improvement
Components: HiveServer2
Reporter: Misha Dmitriev
Assignee: Misha Dmitriev
I've created a Hive table with 2000 partitions, each backed by two files, with
one row in each file. When I execute some number of concurrent queries against
this table, e.g. as follows
{code}
for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:10000 -n admin -p
admin -e "select count(i_f_1) from misha_table;" & done
{code}
it results in a big memory spike. With 20 queries I caused an OOM in a HS2
server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
I am attaching the results of jxray (www.jxray.com) analysis of a heap dump
that was generated in the 50queries/500m heap scenario. It suggests that there
are several opportunities to reduce memory pressure with not very invasive
changes to the code:
1. 24.5% of memory is wasted by duplicate strings (see section 6). With
String.intern() calls added in the ~10 relevant places in the code, this
overhead can be highly reduced.
2. Almost 20% of memory is wasted due to various suboptimally used collections
(see section 8). There are many maps and lists that are either empty or have
just 1 element. By modifying the code that creates and populates these
collections, we may likely save 5-10% of memory.
3. Almost 20% of memory is used by instances of java.util.Properties. It looks
like these objects are highly duplicate, since for each Partition each
concurrently running query creates its own copy of Partion, PartitionDesc and
Properties. Thus we have nearly 100,000 (50 queries * 2,000 partitions)
Properties in memory. By interning/deduplicating these objects we may be able
to save perhaps 15% of memory.
So overall, I think there is a good chance to reduce HS2 memory consumption in
this scenario by ~40%.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)