[
https://issues.apache.org/jira/browse/HDDS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458541#comment-17458541
]
Attila Doroszlai commented on HDDS-6102:
----------------------------------------
Thanks [~djcelis] for the detailed report. I think this is the same as
HDDS-5087, which has been fixed in 1.2.0. Is it possible for you to upgrade to
that version?
> Integrating Ozone with Hive produce a thread leak in HS2 server
> ---------------------------------------------------------------
>
> Key: HDDS-6102
> URL: https://issues.apache.org/jira/browse/HDDS-6102
> Project: Apache Ozone
> Issue Type: Bug
> Components: OFS
> Affects Versions: 1.1.0
> Environment: ozone: 1.1.0
> hadoop: 3.1.1
> hive: 3.1.0
> tez: 0.10.0 (this version is needed because of
> [TEZ-4032|https://issues.apache.org/jira/browse/TEZ-4032])
> both the hadoop cluster + ozone are secured using kerberos.
> Reporter: Diego Jaramillo
> Priority: Major
>
> Integration ozone with hive is producing a thread leak in HS2, in this
> sample, 12 open connections to hive produced 149 threads and the count kept
> increasing until HS2 needed to be restarted.
> SETTINGS:
> HDFS integration using the following settings
> - viewfs-mount-table:
> fs.viewfs.mounttable.clusters.link./cluster1=hdfs://cluster1
> fs.viewfs.mounttable.clusters.link./ozfs1=ofs://ozfs1
> - core-site.xml:
> fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
> fs.AbstractFileSystem.o3fs.impl=org.apache.hadoop.fs.ozone.OzFs
>
> - hdfs-site.xml:
> ozone.om.service.ids=ozfs1
> ozone.om.nodes.ozfs1=om1,om2
> ozone.om.address.ozfs1.om1=ozone1.domain.com:9862
> ozone.om.address.ozfs1.om2=ozone2.domain.com:9862
> dfs.nameservices=cluster1,ozfs1
> ozone.om.kerberos.keytab.file=/etc/security/keytabs/om.service.keytab
> ozone.om.kerberos.principal=om/[email protected]
>
> Hive integration using the following setting
> - hive-site.xml:
> tez.job.fs-servers=hdfs://cluster1,ofs://ozfs1
> mapreduce.job.hdfs-servers=hdfs://cluster1,ofs://ozfs1
>
> From hive's stack trace we see many thread like these:
> {{Thread 4958 (Truststore reloader thread):}}
> {{ State: TIMED_WAITING}}
> {{ Blocked count: 0}}
> {{ Waited count: 221}}
> {{ Stack:}}
> {{ java.lang.Thread.$$YJP$$sleep(Native Method)}}
> {{ java.lang.Thread.sleep(Thread.java)}}
> {{
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:195)}}
> {{ java.lang.Thread.run(Thread.java:748)}}
> {{Thread 4948 (Truststore reloader thread):}}
> {{ State: TIMED_WAITING}}
> {{ Blocked count: 79}}
> {{ Waited count: 221}}
> {{ Stack:}}
> {{ java.lang.Thread.$$YJP$$sleep(Native Method)}}
> {{ java.lang.Thread.sleep(Thread.java)}}
> {{
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:195)}}
> {{ java.lang.Thread.run(Thread.java:748)}}
> {{Thread 4777 (Truststore reloader thread):}}
> {{ State: TIMED_WAITING}}
> {{ Blocked count: 0}}
> {{ Waited count: 252}}
> {{ Stack:}}
> {{ java.lang.Thread.$$YJP$$sleep(Native Method)}}
> {{ java.lang.Thread.sleep(Thread.java)}}
> {{
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run(ReloadingX509TrustManager.java:195)}}
> {{ java.lang.Thread.run(Thread.java:748)}}
>
> Using yourKit we identified the following:
> {{java.lang.Thread.<init>(Runnable, String) Thread.java
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.init()
> ReloadingX509TrustManager.java:95
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(SSLFactory$Mode)
> FileBasedKeyStoresFactory.java:223
> org.apache.hadoop.security.ssl.SSLFactory.init() SSLFactory.java:180
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector.getSSLFactory(Configuration)
> TimelineConnector.java:181
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(Configuration)
> TimelineConnector.java:108
> org.apache.hadoop.service.AbstractService.init(Configuration)
> AbstractService.java:164
> org.apache.hadoop.service.CompositeService.serviceInit(Configuration)
> CompositeService.java:108
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(Configuration)
> TimelineClientImpl.java:130
> org.apache.hadoop.service.AbstractService.init(Configuration)
> AbstractService.java:164
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken()
> YarnClientImpl.java:405
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(ContainerLaunchContext)
> YarnClientImpl.java:381
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(ApplicationSubmissionContext)
> YarnClientImpl.java:300
> org.apache.tez.client.TezYarnClient.submitApplication(ApplicationSubmissionContext)
> TezYarnClient.java:77
> org.apache.tez.client.TezClient.start() TezClient.java:402
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezClient,
> HiveConf, Map, TezConfiguration, boolean) TezSessionState.java:516
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(String[],
> boolean, SessionState$LogHelper, TezSessionState$HiveResources)
> TezSessionState.java:451
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(String[],
> boolean, SessionState$LogHelper, TezSessionState$HiveResources)
> TezSessionPoolSession.java:124
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(48String[])
> TezSessionState.java:373
> org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezSessionState,
> String[]) TezTask.java:373
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(DriverContext)
> TezTask.java:200
> org.apache.hadoop.hive.ql.exec.Task.executeTask(HiveHistory) Task.java:212
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential() TaskRunner.java:103
> org.apache.hadoop.hive.ql.Driver.launchTask(Task, String, boolean, String,
> int, DriverContext) Driver.java:2712
> org.apache.hadoop.hive.ql.Driver.execute() Driver.java:2383
> org.apache.hadoop.hive.ql.Driver.runInternal(String, boolean) Driver.java:2055
> org.apache.hadoop.hive.ql.Driver.run(String, boolean) Driver.java:1753
> org.apache.hadoop.hive.ql.Driver.run() Driver.java:1747
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run() ReExecDriver.java:157
> org.apache.hive.service.cli.operation.SQLOperation.runQuery()
> SQLOperation.java:226
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation)
> SQLOperation.java:87
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run()
> SQLOperation.java:324
> javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction)
> Subject.java
> org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction)
> UserGroupInformation.java:1729
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run()
> SQLOperation.java:342
> java.util.concurrent.Executors$RunnableAdapter.call() Executors.java:511
> java.util.concurrent.FutureTask.run() FutureTask.java:266
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> ThreadPoolExecutor.java:1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run()
> ThreadPoolExecutor.java:624
> java.lang.Thread.run() Thread.java:748
> ----
> java.lang.Thread.<init>(Runnable, String) Thread.java
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.init()
> ReloadingX509TrustManager.java:95
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(SSLFactory$Mode)
> FileBasedKeyStoresFactory.java:223
> org.apache.hadoop.security.ssl.SSLFactory.init() SSLFactory.java:180
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.<init>(URI, Configuration)
> KMSClientProvider.java:390
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProviders(Configuration,
> URL, int, String) KMSClientProvider.java:318
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(URI,
> Configuration) KMSClientProvider.java:303
> org.apache.hadoop.crypto.key.KeyProviderFactory.get(URI, Configuration)
> KeyProviderFactory.java:96
> org.apache.hadoop.util.KMSUtil.createKeyProviderFromUri(Configuration, URI)
> KMSUtil.java:83
> org.apache.hadoop.ozone.client.rpc.OzoneKMSUtil.getKeyProvider(ConfigurationSource,
> URI) OzoneKMSUtil.java:138
> org.apache.hadoop.ozone.client.rpc.RpcClient.getKeyProvider()
> RpcClient.java:1310
> org.apache.hadoop.ozone.client.ObjectStore.getKeyProvider()
> ObjectStore.java:222
> org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getKeyProvider()
> BasicRootedOzoneClientAdapterImpl.java:785
> org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.getKeyProvider()
> RootedOzoneFileSystem.java:54
> org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.getAdditionalTokenIssuers()
> RootedOzoneFileSystem.java:67
> org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer,
> String, Credentials, List) DelegationTokenIssuer.java:104
> org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(String,
> Credentials) DelegationTokenIssuer.java:76
> org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(FileSystem,
> Credentials, Configuration) TokenCache.java:140
> org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(Credentials,
> Path[], Configuration) TokenCache.java:101
> org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(Credentials,
> Path[], Configuration) TokenCache.java:77
> org.apache.tez.client.TezClientUtils.populateTokenCache(TezConfiguration,
> Credentials) TezClientUtils.java:746
> org.apache.tez.client.TezClientUtils.prepareAmLaunchCredentials(AMConfiguration,
> Credentials, TezConfiguration, Path) TezClientUtils.java:722
> org.apache.tez.client.TezClientUtils.createApplicationSubmissionContext(ApplicationId,
> DAG, String, AMConfiguration, Map, Credentials, boolean, TezApiVersionInfo,
> ServicePluginsDescriptor, JavaOptsChecker) TezClientUtils.java:487
> org.apache.tez.client.TezClient.setupApplicationContext() TezClient.java:501
> org.apache.tez.client.TezClient.start() TezClient.java:401
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezClient,
> HiveConf, Map, TezConfiguration, boolean) TezSessionState.java:516
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(String[],
> boolean, SessionState$LogHelper, TezSessionState$HiveResources)
> TezSessionState.java:451
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(String[],
> boolean, SessionState$LogHelper, TezSessionState$HiveResources)
> TezSessionPoolSession.java:124
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(String[])
> TezSessionState.java:373
> org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezSessionState,
> String[]) TezTask.java:373
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(DriverContext)
> TezTask.java:200
> org.apache.hadoop.hive.ql.exec.Task.executeTask(HiveHistory) Task.java:212
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential() TaskRunner.java:103
> org.apache.hadoop.hive.ql.Driver.launchTask(Task, String, boolean, String,
> int, DriverContext) Driver.java:2712
> org.apache.hadoop.hive.ql.Driver.execute() Driver.java:2383
> org.apache.hadoop.hive.ql.Driver.runInternal(String, boolean) Driver.java:2055
> org.apache.hadoop.hive.ql.Driver.run(String, boolean) Driver.java:1753
> org.apache.hadoop.hive.ql.Driver.run() Driver.java:1747
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run() ReExecDriver.java:157
> org.apache.hive.service.cli.operation.SQLOperation.runQuery()
> SQLOperation.java:226
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation)
> SQLOperation.java:87
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run()
> SQLOperation.java:324
> javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction)
> Subject.java
> org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction)
> UserGroupInformation.java:1729
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run()
> SQLOperation.java:342
> java.util.concurrent.Executors$RunnableAdapter.call() Executors.java:511
> java.util.concurrent.FutureTask.run() FutureTask.java:266
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> ThreadPoolExecutor.java:1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run()
> ThreadPoolExecutor.java:624
> java.lang.Thread.run() Thread.java:748}}
>
> This looks similar to
> [HDFS-14037|https://issues.apache.org/jira/browse/HDFS-14037]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]