[
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713062#comment-14713062
]
ASF GitHub Bot commented on FLINK-2399:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/945#discussion_r37977616
--- Diff:
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
---
@@ -184,26 +188,32 @@ class JobManager(
}
else {
try {
- val instanceID = instanceManager.registerTaskManager(
- taskManager,
- connectionInfo,
- hardwareInformation,
- numberOfSlots,
- leaderSessionID)
+
+ if(jobManagerVersionID == taskManagerVersionID) {
+ val instanceID = instanceManager.registerTaskManager(
+ taskManager,
+ connectionInfo,
+ hardwareInformation,
+ numberOfSlots,
+ leaderSessionID)
- // IMPORTANT: Send the response to the "sender", which is not the
- // TaskManager actor, but the ask future!
- sender() ! decorateMessage(
- AcknowledgeRegistration(
- registrationSessionID,
- leaderSessionID.get,
- self,
- instanceID,
- libraryCacheManager.getBlobServerPort)
- )
+ // IMPORTANT: Send the response to the "sender", which is not
the
+ // TaskManager actor, but the ask future!
+ sender() ! decorateMessage(
+ AcknowledgeRegistration(
+ registrationSessionID,
+ leaderSessionID.get,
+ self,
+ instanceID,
+ libraryCacheManager.getBlobServerPort,
+ jobManagerVersionID)
+ )
- // to be notified when the taskManager is no longer reachable
- context.watch(taskManager)
+ // to be notified when the taskManager is no longer reachable
+ context.watch(taskManager)
+ } else{
+ throw new Exception("Version mismatch error between Job
Manager and Task Manager")
--- End diff --
This will kill the `JobManager`. This is not a good solution. Best you
revert the verification between the `TaskManager` and the `JobManager`.
> Fail when actor versions don't match
> ------------------------------------
>
> Key: FLINK-2399
> URL: https://issues.apache.org/jira/browse/FLINK-2399
> Project: Flink
> Issue Type: Improvement
> Components: JobManager, TaskManager
> Affects Versions: 0.9, master
> Reporter: Ufuk Celebi
> Assignee: Sachin Goel
> Priority: Minor
> Fix For: 0.10
>
>
> Problem: there can be subtle errors when actors from different Flink versions
> communicate with each other, for example when an old client (e.g. Flink 0.9)
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the
> actors and fail if they don't match.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)