This is an automated email from the ASF dual-hosted git repository.
wuyi pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.1 by this push:
new e3d868b [SPARK-37060][CORE][3.1] Handle driver status response from
backup masters
e3d868b is described below
commit e3d868be43e2cdaaa66a3a3e05c73eaf66109a3e
Author: Mohamadreza Rostami <[email protected]>
AuthorDate: Thu Dec 16 15:21:45 2021 +0800
[SPARK-37060][CORE][3.1] Handle driver status response from backup masters
### What changes were proposed in this pull request?
After an improvement in SPARK-31486, contributor uses
'asyncSendToMasterAndForwardReply' method instead of
'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's
status is only available in active master and the
'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we
have to handle the response from the backup masters in the client, which the
developer did not consider in the SPARK-31486 change. So drivers running in
cluster mode and on a [...]
### Why are the changes needed?
We need to find if the response received from a backup master client must
ignore it.
### Does this PR introduce _any_ user-facing change?
No, It's only fixed a bug and brings back the ability to deploy in cluster
mode on multi-master clusters.
### How was this patch tested?
Closes #34911 from mohamadrezarostami/fix-a-bug-in-report-driver-status.
Authored-by: Mohamadreza Rostami <[email protected]>
Signed-off-by: yi.wu <[email protected]>
---
core/src/main/scala/org/apache/spark/deploy/Client.scala | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/deploy/Client.scala
b/core/src/main/scala/org/apache/spark/deploy/Client.scala
index 7c5ab43..15cca66 100644
--- a/core/src/main/scala/org/apache/spark/deploy/Client.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/Client.scala
@@ -190,13 +190,15 @@ private class ClientEndpoint(
logDebug(s"State of driver $submittedDriverID is ${state.get},
" +
s"continue monitoring driver status.")
}
- }
- }
- } else {
+ }
+ }
+ } else if (exception.exists(e => Utils.responseFromBackup(e.getMessage))) {
+ logDebug(s"The status response is reported from a backup spark
instance. So, ignored.")
+ } else {
logError(s"ERROR: Cluster master did not recognize $submittedDriverID")
System.exit(-1)
- }
}
+ }
override def receive: PartialFunction[Any, Unit] = {
case SubmitDriverResponse(master, success, driverId, message) =>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]