Rui Li created SPARK-14958:
------------------------------
Summary: Failed task hangs if error is encountered when getting
task result
Key: SPARK-14958
URL: https://issues.apache.org/jira/browse/SPARK-14958
Project: Spark
Issue Type: Bug
Reporter: Rui Li
In {{TaskResultGetter}}, if we get an error when deserialize {{TaskEndReason}},
TaskScheduler won't have a chance to handle the failed task and the task just
hangs.
{code}
def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState:
TaskState,
serializedData: ByteBuffer) {
var reason : TaskEndReason = UnknownReason
try {
getTaskResultExecutor.execute(new Runnable {
override def run(): Unit = Utils.logUncaughtExceptions {
val loader = Utils.getContextOrSparkClassLoader
try {
if (serializedData != null && serializedData.limit() > 0) {
reason = serializer.get().deserialize[TaskEndReason](
serializedData, loader)
}
} catch {
case cnd: ClassNotFoundException =>
// Log an error but keep going here -- the task failed, so not
catastrophic
// if we can't deserialize the reason.
logError(
"Could not deserialize TaskEndReason: ClassNotFound with
classloader " + loader)
case ex: Exception => {}
}
scheduler.handleFailedTask(taskSetManager, tid, taskState, reason)
}
})
} catch {
case e: RejectedExecutionException if sparkEnv.isStopped =>
// ignore it
}
}
{code}
In my specific case, I got a NoClassDefFoundError and the failed task hangs
forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]