[
https://issues.apache.org/jira/browse/SPARK-41313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xing Lin updated SPARK-41313:
-----------------------------
Description:
spark-3900 fixed the illegalStateException in cleanupStagingDir in
ApplicationMaster's shutdownhook. However, spark-21138 accidentally
reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing
spark-3900 reported by our users at Linkedin. We need to bring back the fix for
spark-3900.
The illegalStateException when creating a new filesystem object is due to the
limitation in hadoop that we can not register a shutdownhook during shutdown.
So, when a spark job fails during pre-launch, as part of shutdown,
cleanupStagingDir would be called. Then, if we attempt to create a new
filesystem object for the first time, hadoop would try to register a hook to
shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a
result, we hit the illegalStateException. We should avoid the creation of a new
filesystem object in cleanupStagingDir() when it is called in a shutdown hook.
This was introduced in spark-3900. However, spark-21138 accidentally
reverted/undid that change. We need to bring back that fix to Spark to avoid
the illegalStateException.
was:spark-3900 fixed the illegalStateException in cleanupStagingDir in
ApplicationMaster's shutdownhook. However, spark-21138 reverted that change
when fixing the "Wrong FS" bug. We need both fixes.
> Combine fixes for SPARK-3900 and SPARK-21138
> --------------------------------------------
>
> Key: SPARK-41313
> URL: https://issues.apache.org/jira/browse/SPARK-41313
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, YARN
> Affects Versions: 3.4.0
> Reporter: Xing Lin
> Priority: Major
>
> spark-3900 fixed the illegalStateException in cleanupStagingDir in
> ApplicationMaster's shutdownhook. However, spark-21138 accidentally
> reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing
> spark-3900 reported by our users at Linkedin. We need to bring back the fix
> for spark-3900.
> The illegalStateException when creating a new filesystem object is due to the
> limitation in hadoop that we can not register a shutdownhook during shutdown.
> So, when a spark job fails during pre-launch, as part of shutdown,
> cleanupStagingDir would be called. Then, if we attempt to create a new
> filesystem object for the first time, hadoop would try to register a hook to
> shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a
> result, we hit the illegalStateException. We should avoid the creation of a
> new filesystem object in cleanupStagingDir() when it is called in a shutdown
> hook. This was introduced in spark-3900. However, spark-21138 accidentally
> reverted/undid that change. We need to bring back that fix to Spark to avoid
> the illegalStateException.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]