[
https://issues.apache.org/jira/browse/SPARK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068814#comment-14068814
]
Kousuke Saruta commented on SPARK-2583:
---------------------------------------
Hi [~pwendell],
When I simulate disk fault on shuffle, I saw following 2 behaviors.
#1 is related to this topic and #2 is about this topic.
1) Fetching from executor locally itself
To simulate a case of disk fault, I deleted bucket file.
FileNotFoundException was thrown and
after retry,
2)
> ConnectionManager cannot distinguish whether error occurred or not
> ------------------------------------------------------------------
>
> Key: SPARK-2583
> URL: https://issues.apache.org/jira/browse/SPARK-2583
> Project: Spark
> Issue Type: Bug
> Reporter: Kousuke Saruta
> Assignee: Kousuke Saruta
> Priority: Critical
>
> ConnectionManager#handleMessage sent empty messages to another peer if some
> error occurred or not in onReceiveCalback.
> {code}
> val ackMessage = if (onReceiveCallback != null) {
> logDebug("Calling back")
> onReceiveCallback(bufferMessage, connectionManagerId)
> } else {
> logDebug("Not calling back as callback is null")
> None
> }
> if (ackMessage.isDefined) {
> if (!ackMessage.get.isInstanceOf[BufferMessage]) {
> logDebug("Response to " + bufferMessage + " is not a buffer
> message, it is of type "
> + ackMessage.get.getClass)
> } else if (!ackMessage.get.asInstanceOf[BufferMessage].hasAckId) {
> logDebug("Response to " + bufferMessage + " does not have ack
> id set")
> ackMessage.get.asInstanceOf[BufferMessage].ackId =
> bufferMessage.id
> }
> }
> // We have no way to tell peer whether error occurred or not
> sendMessage(connectionManagerId, ackMessage.getOrElse {
> Message.createBufferMessage(bufferMessage.id)
> })
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)