[jira] [Commented] (SPARK-22374) STS ran into OOM in a secure cluster

Steve Loughran (JIRA) Tue, 21 Nov 2017 12:30:18 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261444#comment-16261444
 ]


Steve Loughran commented on SPARK-22374:
----------------------------------------

We need to do something about this, it is dangerously recurrent. Even though 
the fix is {{closeAllForUGI}}, we should be able to track it better and start 
warning early and meaningfully. For example

# record timestamps of creation
# track total cache size as an instrumented value
# start warning when it gets big

> STS ran into OOM in a secure cluster
> ------------------------------------
>
>                 Key: SPARK-22374
>                 URL: https://issues.apache.org/jira/browse/SPARK-22374
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Dongjoon Hyun
>         Attachments: 1.png, 2.png, 3.png
>
>
> In a secure cluster, FileSystem.CACHE grows indefinitely.
> *ENVIRONMENT*
> 1. `spark.yarn.principal` and `spark.yarn.keytab` is used.
> 2. Spark Thrift Server run with `doAs` false.
> {code}
> <property>
>   <name>hive.server2.enable.doAs</name>
>   <value>false</value>
> </property>
> {code}
> With 6GB (-Xmx6144m) options, `HiveConf` consumes 4GB inside FileSystem.CACHE.
> {code}
> 20,030 instances of "org.apache.hadoop.hive.conf.HiveConf", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x64001c160" occupy 4,418,101,352 
> (73.42%) bytes. These instances are referenced from one instance of 
> "java.util.HashMap$Node[]", loaded by "<system class loader>"
> {code}
> Please see the attached images.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-22374) STS ran into OOM in a secure cluster

Reply via email to