This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 115f777 [SPARK-21449][SQL][FOLLOWUP] Avoid log undesirable
IllegalStateException when state close
115f777 is described below
commit 115f777cb0a9dff78497bad9b64daa5da1ee0e51
Author: Kent Yao <[email protected]>
AuthorDate: Wed Mar 17 15:21:23 2021 +0800
[SPARK-21449][SQL][FOLLOWUP] Avoid log undesirable IllegalStateException
when state close
### What changes were proposed in this pull request?
`TmpOutputFile` and `TmpErrOutputFile` are registered in
`o.a.h.u.ShutdownHookManager `during creatation. The `state.close()` will
delete them if they are not null and try remove them from the
`o.a.h.u.ShutdownHookManager` which causes IllegalStateException when we call
it in our ShutdownHookManager too.
In this PR, we delete them ahead with a high priority hook in Spark and set
them to null to bypass the deletion and canceling in `state.close()`
### Why are the changes needed?
W/ or w/o this PR, the deletion of these files is not affected, we just
mute an undesirable error log here.
### Does this PR introduce _any_ user-facing change?
no, this is a follow-up
### How was this patch tested?
#### the undesirable gone
```scala
spark-sql> 21/03/16 18:41:31 ERROR Utils: Uncaught exception in thread
shutdown-hook-0
java.lang.IllegalStateException: Shutdown in progress, cannot cancel a
deleteOnExit
at
org.apache.hive.common.util.ShutdownHookManager.cancelDeleteOnExit(ShutdownHookManager.java:106)
at
org.apache.hadoop.hive.common.FileUtils.deleteTmpFile(FileUtils.java:861)
at
org.apache.hadoop.hive.ql.session.SessionState.deleteTmpErrOutputFile(SessionState.java:325)
at
org.apache.hadoop.hive.ql.session.SessionState.dropSessionPaths(SessionState.java:829)
at
org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1585)
at
org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:66)
at
org.apache.spark.sql.hive.client.HiveClientImpl.closeState(HiveClientImpl.scala:172)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$new$1(HiveClientImpl.scala:175)
at
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1994)
at
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
(python) ✘ kentyaohulk
~/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210316 cd ..
(python) kentyaohulk ~/Downloads/spark tar zxf
spark-3.2.0-SNAPSHOT-bin-20210316.tgz
(python) kentyaohulk ~/Downloads/spark cd -
~/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210316
(python) kentyaohulk ~/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210316
bin/spark-sql --conf spark.local.dir=./local --conf
spark.hive.exec.local.scratchdir=./local
21/03/16 18:42:15 WARN Utils: Your hostname, hulk.local resolves to a
loopback address: 127.0.0.1; using 10.242.189.214 instead (on interface en0)
21/03/16 18:42:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
21/03/16 18:42:15 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
21/03/16 18:42:16 WARN SparkConf: Note that spark.local.dir will be
overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in
mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
21/03/16 18:42:18 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout
does not exist
21/03/16 18:42:18 WARN HiveConf: HiveConf of name hive.stats.retries.wait
does not exist
21/03/16 18:42:19 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so recording the
schema version 2.3.0
21/03/16 18:42:19 WARN ObjectStore: setMetaStoreSchemaVersion called but
recording version is disabled: version = 2.3.0, comment = Set by MetaStore
kentyao127.0.0.1
Spark master: local[*], Application Id: local-1615891336877
spark-sql> %
```
#### and the deletion is still fine
```shell
kentyaohulk ~/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210316
ls -al local
total 0
drwxr-xr-x 7 kentyao staff 224 3 16 18:42 .
drwxr-xr-x 19 kentyao staff 608 3 16 18:42 ..
drwx------ 2 kentyao staff 64 3 16 18:42
16cc5238-e25e-4c0f-96ef-0c4bdecc7e51
-rw-r--r-- 1 kentyao staff 0 3 16 18:42
16cc5238-e25e-4c0f-96ef-0c4bdecc7e51219959790473242539.pipeout
-rw-r--r-- 1 kentyao staff 0 3 16 18:42
16cc5238-e25e-4c0f-96ef-0c4bdecc7e518816377057377724129.pipeout
drwxr-xr-x 2 kentyao staff 64 3 16 18:42
blockmgr-37a52ad2-eb56-43a5-8803-8f58d08fe9ad
drwx------ 3 kentyao staff 96 3 16 18:42
spark-101971df-f754-47c2-8764-58c45586be7e
kentyaohulk ~/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210316 ls -al
local
total 0
drwxr-xr-x 2 kentyao staff 64 3 16 19:22 .
drwxr-xr-x 19 kentyao staff 608 3 16 18:42 ..
kentyaohulk ~/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210316
```
Closes #31850 from yaooqinn/followup.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../apache/spark/sql/hive/client/HiveClientImpl.scala | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 800c3ca..35dd2c1 100644
---
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -155,7 +155,24 @@ private[hive] class HiveClientImpl(
}
}
- ShutdownHookManager.addShutdownHook(() => state.close())
+ private def closeState(): Unit = {
+ // These temp files are registered in o.a.h.u.ShutdownHookManager too
during state start.
+ // The state.close() will delete them if they are not null and try remove
them from the
+ // o.a.h.u.ShutdownHookManager which causes undesirable
IllegalStateException.
+ // We delete them ahead with a high priority hook here and set them to
null to bypass the
+ // deletion in state.close().
+ if (state.getTmpOutputFile != null) {
+ state.getTmpOutputFile.delete()
+ state.setTmpOutputFile(null)
+ }
+ if (state.getTmpErrOutputFile != null) {
+ state.getTmpErrOutputFile.delete()
+ state.setTmpErrOutputFile(null)
+ }
+ state.close()
+ }
+
+ ShutdownHookManager.addShutdownHook(() => closeState())
// Log the default warehouse location.
logInfo(
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]