[
https://issues.apache.org/jira/browse/HIVE-24800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308318#comment-17308318
]
Wenping Zhan edited comment on HIVE-24800 at 3/25/21, 3:08 AM:
---------------------------------------------------------------
i consider two methods to slove the issues:
1. enable spark dynamicAllocation, it won't trigger the code
2. the other is throw InterruptException directly,eg:
{code:java}
private void getSparkMemoryAndCores(OptimizeSparkProcContext context) throws
SemanticException {
if (sparkMemoryAndCores != null) {
return;
}
if (context.getConf().getBoolean(SPARK_DYNAMIC_ALLOCATION_ENABLED, false)) {
// If dynamic allocation is enabled, numbers for memory and cores are
meaningless. So, we don't
// try to get it.
sparkMemoryAndCores = null;
return;
}
SparkSessionManager sparkSessionManager = null;
SparkSession sparkSession = null;
try {
sparkSessionManager = SparkSessionManagerImpl.getInstance();
sparkSession = SparkUtilities.getSparkSession(
context.getConf(), sparkSessionManager);
sparkMemoryAndCores = sparkSession.getMemoryAndCores();
} catch (HiveException e) {
throw new SemanticException("Failed to get a spark session: " + e);
} catch (Exception e) {
LOG.warn("Failed to get spark memory/core info", e);
// throw InterruptedException
if(e instanceof InterruptedException){
throw new SemanticException("Semantic analysis failed as current thread
is interrupted: " + e);
}
} finally {
if (sparkSession != null && sparkSessionManager != null) {
try {
sparkSessionManager.returnSession(sparkSession);
} catch (HiveException ex) {
LOG.error("Failed to return the session to SessionManager: " + ex,
ex);
}
}
}
}
{code}
Who can give me more suggestion, thx!
was (Author: janvn1989):
i consider two methods to slove the issues:
1. enable spark dynamicAllocation, it won't trigger the code
2. the other is throw InterruptException directly,eg:
{code:java}
private void getSparkMemoryAndCores(OptimizeSparkProcContext context) throws
SemanticException {
if (sparkMemoryAndCores != null) {
return;
}
if (context.getConf().getBoolean(SPARK_DYNAMIC_ALLOCATION_ENABLED, false)) {
// If dynamic allocation is enabled, numbers for memory and cores are
meaningless. So, we don't
// try to get it.
sparkMemoryAndCores = null;
return;
}
SparkSessionManager sparkSessionManager = null;
SparkSession sparkSession = null;
try {
sparkSessionManager = SparkSessionManagerImpl.getInstance();
sparkSession = SparkUtilities.getSparkSession(
context.getConf(), sparkSessionManager);
sparkMemoryAndCores = sparkSession.getMemoryAndCores();
} catch (HiveException e) {
throw new SemanticException("Failed to get a spark session: " + e);
} catch (Exception e) {
LOG.warn("Failed to get spark memory/core info", e);
// throw InterruptedException
if(e instanceof InterruptedException){
throw new SemanticException("Semantic analysis failed as current thread
is interrupted: " + e);
}
} finally {
if (sparkSession != null && sparkSessionManager != null) {
try {
sparkSessionManager.returnSession(sparkSession);
} catch (HiveException ex) {
LOG.error("Failed to return the session to SessionManager: " + ex,
ex);
}
}
}
}
{code}
who can give me more suggestm, thx!
> Yarn cluster resource can't be released when close jdbc connection for
> hiveserver2
> ----------------------------------------------------------------------------------
>
> Key: HIVE-24800
> URL: https://issues.apache.org/jira/browse/HIVE-24800
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2, JDBC, Spark
> Affects Versions: 2.3.6
> Reporter: Wenping Zhan
> Priority: Critical
>
> When I submit an application job on yarn cluster by Hive on Spark. I found
> yarn cluster resource which app had apply for can't be released after closing
> jdbc connection for hiveserver2. I think multi thread operate the same
> connect would cause this issue, one thread create jdbc connection and start
> compile Hql, the other thread interrupt the connection when create DAG . I
> get error info below and alway recurrent this issue, can you help me confirm
> if it is an issue?[~xuefuz], How I do if indeed?
> *error info:*
> 2020-12-10T10:24:30,069 INFO [HiveServer2-Handler-Pool: Thread-824055]
> session.SessionState: Updating thread name to
> 11bf6a51-9a3b-488b-885e-fc50af485a64 HiveServer2-Handler-Pool: Thread-824055
> 2020-12-10T10:24:30,069 INFO [11bf6a51-9a3b-488b-885e-fc50af485a64
> HiveServer2-Handler-Pool: Thread-824055] operation.OperationManager: Closing
> operation: OperationHandle [opType=EXECUTE_STATEMENT,
> getHandleIdentifier()=df4e1322-c60a-4c28-83e3-c19ae9243404]
> 2020-12-10T10:24:30,069 INFO [11bf6a51-9a3b-488b-885e-fc50af485a64
> HiveServer2-Handler-Pool: Thread-824055] operation.Operation: The running
> operation has been successfully interrupted: null
> 2020-12-10T10:24:30,070 INFO [11bf6a51-9a3b-488b-885e-fc50af485a64
> HiveServer2-Handler-Pool: Thread-824055] session.HiveSessionImpl: Operation
> log session directory is deleted:
> /cloud/data/hive/tmp/hadoop/operation_logs/11bf6a51-9a3b-488b-885e-fc50af485a64
> 2020-12-10T10:24:30,070 INFO [11bf6a51-9a3b-488b-885e-fc50af485a64
> HiveServer2-Handler-Pool: Thread-824055] conf.HiveConf: Using the default
> value passed in for log id: 11bf6a51-9a3b-488b-885e-fc50af485a64
> 2020-12-10T10:24:30,070 INFO [11bf6a51-9a3b-488b-885e-fc50af485a64
> HiveServer2-Handler-Pool: Thread-824055] session.SessionState: Resetting
> thread name to HiveServer2-Handler-Pool: Thread-824055
> 2020-12-10T10:24:30,071 WARN [RPC-Handler-7] rpc.RpcDispatcher:
> [ClientProtocol] Closing RPC channel with 2 outstanding RPCs.
> 2020-12-10T10:24:30,070 WARN [HiveServer2-Background-Pool: Thread-824059]
> spark.SetSparkReducerParallelism: Failed to get spark memory/core info
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method) ~[?:1.8.0_231]
> at java.lang.Object.wait(Object.java:460) ~[?:1.8.0_231]
> at
> io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:600)
> ~[netty-all-4.0.52.Final.jar:4.0.52.Final]
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:273)
> ~[netty-all-4.0.52.Final.jar:4.0.52.Final]
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:46)
> ~[netty-all-4.0.52.Final.jar:4.0.52.Final]
> at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:156)
> ~[hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:166)
> ~[hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77)
> ~[hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:238)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11273)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> [hive-exec-2.3.6.jar:2.3.6]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> [hive-exec-2.3.6.jar:2.3.6]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> [hive-exec-2.3.6.jar:2.3.6]
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
> [hive-exec-2.3.6.jar:2.3.6]
> at
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
> [hive-service-2.3.6.jar:2.3.6]
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:346)
> [hive-service-2.3.6.jar:2.3.6]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[?:1.8.0_231]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_231]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> [hadoop-common-2.8.5.jar:?]
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
> [hive-service-2.3.6.jar:2.3.6]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_231]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_231]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_231]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_231]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
> ...
> ...
> ...
> 2020-12-10T10:24:43,770 INFO [HiveServer2-Background-Pool: Thread-824059]
> parse.CalcitePlanner: Completed plan generation
> 2020-12-10T10:24:43,770 INFO [HiveServer2-Background-Pool: Thread-824059]
> ql.Driver: Semantic Analysis Completed
> 2020-12-10T10:24:43,770 ERROR [HiveServer2-Background-Pool: Thread-824059]
> ql.Driver: FAILED: command has been interrupted: after analyzing query.
> 2020-12-10T10:24:43,774 INFO [HiveServer2-Background-Pool: Thread-824059]
> ql.Driver: Compiling
> command(queryId=hadoop_20201210102414_27d30b57-1633-4c8a-b326-852b70bb2c2d)
> has been interrupted after 28.804 seconds
> 2020-12-10T10:24:43,774 ERROR [HiveServer2-Background-Pool: Thread-824059]
> operation.Operation: Error running hive query:
> org.apache.hive.service.cli.HiveSQLException: Illegal Operation state
> transition from CLOSED to ERROR
> at
> org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:96)
> ~[hive-service-2.3.6.jar:2.3.6]
> at
> org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:102)
> ~[hive-service-2.3.6.jar:2.3.6]
> at
> org.apache.hive.service.cli.operation.Operation.setState(Operation.java:164)
> ~[hive-service-2.3.6.jar:2.3.6]
> at
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:234)
> ~[hive-service-2.3.6.jar:2.3.6]
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:346)
> [hive-service-2.3.6.jar:2.3.6]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[?:1.8.0_231]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_231]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> [hadoop-common-2.8.5.jar:?]
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
> [hive-service-2.3.6.jar:2.3.6]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_231]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_231]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_231]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_231]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)