[ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713062#comment-14713062
 ] 

ASF GitHub Bot commented on FLINK-2399:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/945#discussion_r37977616
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
    @@ -184,26 +188,32 @@ class JobManager(
           }
           else {
             try {
    -          val instanceID = instanceManager.registerTaskManager(
    -            taskManager,
    -            connectionInfo,
    -            hardwareInformation,
    -            numberOfSlots,
    -            leaderSessionID)
    +        
    +          if(jobManagerVersionID == taskManagerVersionID) {
    +            val instanceID = instanceManager.registerTaskManager(
    +              taskManager,
    +              connectionInfo,
    +              hardwareInformation,
    +              numberOfSlots,
    +              leaderSessionID)
     
    -          // IMPORTANT: Send the response to the "sender", which is not the
    -          //            TaskManager actor, but the ask future!
    -          sender() ! decorateMessage(
    -            AcknowledgeRegistration(
    -              registrationSessionID,
    -              leaderSessionID.get,
    -              self,
    -              instanceID,
    -              libraryCacheManager.getBlobServerPort)
    -          )
    +            // IMPORTANT: Send the response to the "sender", which is not 
the
    +            //            TaskManager actor, but the ask future!
    +            sender() ! decorateMessage(
    +              AcknowledgeRegistration(
    +                registrationSessionID,
    +                leaderSessionID.get,
    +                self,
    +                instanceID,
    +                libraryCacheManager.getBlobServerPort,
    +                jobManagerVersionID)
    +            )
     
    -          // to be notified when the taskManager is no longer reachable
    -          context.watch(taskManager)
    +            // to be notified when the taskManager is no longer reachable
    +            context.watch(taskManager)
    +          } else{
    +            throw new Exception("Version mismatch error between Job 
Manager and Task Manager")
    --- End diff --
    
    This will kill the `JobManager`. This is not a good solution. Best you 
revert the verification between the `TaskManager` and the `JobManager`.


> Fail when actor versions don't match
> ------------------------------------
>
>                 Key: FLINK-2399
>                 URL: https://issues.apache.org/jira/browse/FLINK-2399
>             Project: Flink
>          Issue Type: Improvement
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9, master
>            Reporter: Ufuk Celebi
>            Assignee: Sachin Goel
>            Priority: Minor
>             Fix For: 0.10
>
>
> Problem: there can be subtle errors when actors from different Flink versions 
> communicate with each other, for example when an old client (e.g. Flink 0.9) 
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the 
> actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to