[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366899#comment-14366899
 ] 

Chengxiang Li commented on HIVE-10006:
--------------------------------------

Root Cause:
In RSC, while spark call CombineHiveInputFormat::getSplits to split the job 
into tasks in a thread called "dag-scheduler-event-loop", MapWork would be 
added to a ThreadLocal map of "dag-scheduler-event-loop", and never get 
removed. As the "dag-scheduler-event-loop" thread is a long live and daemon 
thread, so all the MapWorks would be hold in the ThreadLocal map until RSC jvm 
crash or exit.
Hive hit this issue on MR mode as well, it just lucky that the thread which 
calls CombineHiveInputFormat::getSplits is TaskRunner, which would be abandoned 
after query finished, so Hive driver does not get memory leak on this.

> RSC has memory leak while execute multi queries.[Spark Branch]
> --------------------------------------------------------------
>
>                 Key: HIVE-10006
>                 URL: https://issues.apache.org/jira/browse/HIVE-10006
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Critical
>              Labels: Spark-M5
>
> While execute query with RSC, MapWork/ReduceWork number is increased all the 
> time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to