[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370824#comment-14370824
 ] 

Chengxiang Li commented on HIVE-10006:
--------------------------------------

[~lirui], yes, we may get memory leak if other HiveInputFormat implementations 
is used, but just clean ThreadLocal cache in HiveInputFormat::init does not 
work, as there is other code in CombineHiveInputFormat which may invoke 
Utilities.getMapWork as well. I add a extra method in Utilities called 
getMapWorkWithoutCache in latest patch, so it wouldn't cache MapWork in 
ThreadLocal.
[~jxiang], there are not invoked in the same thread, so we need to clean the 
ThreadLocal cache sperately.
Besides, is this ThreadLocal MapWork/ReduceWork cache new introduced 
optimization? as Utilities::getMapWork/getReduceWork/setMapWork/setReduceWork 
is widely used, it's very hard to identify which thread the related code would 
be executed and clean the ThreadLocal cache finally, so it may quite easy to 
cause memory leak if the methods are not used very carefully.

> RSC has memory leak while execute multi queries.[Spark Branch]
> --------------------------------------------------------------
>
>                 Key: HIVE-10006
>                 URL: https://issues.apache.org/jira/browse/HIVE-10006
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Critical
>              Labels: Spark-M5
>         Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, 
> HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, 
> HIVE-10006.5-spark.patch
>
>
> While execute query with RSC, MapWork/ReduceWork number is increased all the 
> time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to