[
https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359157#comment-17359157
]
Ivan Podhornyi commented on HIVE-16220:
---------------------------------------
[~james601232]
got the same issue, and after week of research found few solution:
Remove .cache() DataFrame from your code, because it will create a SessionState
which is full copy of Session.
If not caching DataFrame is show stopper for you - [here is a Scala
method|https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L189],
where I guess possible to create a clone of SparkSession for each batch
processing and then close it. Need just to check overhead.
> Memory leak when creating a table using location and NameNode in HA
> -------------------------------------------------------------------
>
> Key: HIVE-16220
> URL: https://issues.apache.org/jira/browse/HIVE-16220
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 1.2.1, 2.3.4, 3.0.0
> Environment: HDP-2.4.0.0
> HDP-3.1.0.0
> Reporter: Angel Alvarez Pascua
> Priority: Major
>
> The following simple DDL
> CREATE TABLE `test`(`field` varchar(1)) LOCATION
> 'hdfs://benderHA/apps/hive/warehouse/test'
> ends up generating a huge memory leak in the HiveServer2 service.
> After two weeks without a restart, the service stops suddenly because of
> OutOfMemory errors.
> This only happens when we're in an environment in which the NameNode is in
> HA, otherwise, nothing (so weird) happens. If the location clause is not
> present, everything is also fine.
> It seems, multiples instances of Hadoop configuration are created when we're
> in an HA environment:
> <AFTER ONE EXECUTIONS OF CREATE TABLE WITH LOCATION>
> 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88"
> occupy 350.263.816 (81,66%) bytes. These instances are referenced from one
> instance of "java.util.HashMap$Node[]",
> loaded by "<system class loader>"
> <AFTER TWO EXECUTIONS OF CREATE TABLE WITH LOCATION>
> 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88"
> occupy 699.901.416 (87,32%) bytes. These instances are referenced from one
> instance of "java.util.HashMap$Node[]",
> loaded by "<system class loader>"
--
This message was sent by Atlassian Jira
(v8.3.4#803005)