Thomas Friedrich created HIVE-14210:
---------------------------------------

             Summary: SSLFactory truststore reloader threads leaking in 
HiveServer2
                 Key: HIVE-14210
                 URL: https://issues.apache.org/jira/browse/HIVE-14210
             Project: Hive
          Issue Type: Bug
          Components: Hive, HiveServer2
    Affects Versions: 2.1.0, 2.0.0, 1.2.1
            Reporter: Thomas Friedrich


We found an issue in a customer environment where the HS2 crashed after a few 
days and the Java core dump contained several thousands of truststore reloader 
threads:

"Truststore reloader thread" #126 daemon prio=5 os_prio=0 
tid=0x00007f680d2e3000 nid=0x98fd waiting on 
condition [0x00007f67e482c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
(ReloadingX509TrustManager.java:225)
        at java.lang.Thread.run(Thread.java:745)

We found the issue to be caused by a bug in Hadoop where the TimelineClientImpl 
is not destroying the SSLFactory if SSL is enabled in Hadoop and the timeline 
server is running. I opened YARN-5309 which has more details on the problem, 
and a patch was submitted a few days back.

In addition to the changes in Hadoop, there are a couple of Hive changes 
required:
- ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
resources after the submitted job is done/failed
- Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 and 
MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both fixes are 
included in Hadoop 2.6.4. 
However, since we also need to pick up YARN-5309, we need to wait for a new 
release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to