[
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366899#comment-14366899
]
Chengxiang Li commented on HIVE-10006:
--------------------------------------
Root Cause:
In RSC, while spark call CombineHiveInputFormat::getSplits to split the job
into tasks in a thread called "dag-scheduler-event-loop", MapWork would be
added to a ThreadLocal map of "dag-scheduler-event-loop", and never get
removed. As the "dag-scheduler-event-loop" thread is a long live and daemon
thread, so all the MapWorks would be hold in the ThreadLocal map until RSC jvm
crash or exit.
Hive hit this issue on MR mode as well, it just lucky that the thread which
calls CombineHiveInputFormat::getSplits is TaskRunner, which would be abandoned
after query finished, so Hive driver does not get memory leak on this.
> RSC has memory leak while execute multi queries.[Spark Branch]
> --------------------------------------------------------------
>
> Key: HIVE-10006
> URL: https://issues.apache.org/jira/browse/HIVE-10006
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Affects Versions: 1.1.0
> Reporter: Chengxiang Li
> Assignee: Chengxiang Li
> Priority: Critical
> Labels: Spark-M5
>
> While execute query with RSC, MapWork/ReduceWork number is increased all the
> time, and lead to OOM at the end.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)