[ 
https://issues.apache.org/jira/browse/KYLIN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865843#comment-17865843
 ] 

pengfei.zhan commented on KYLIN-5857:
-------------------------------------

h1. Root Cause


Inductotherm job nodes have two roles: master and slave. The master node holds 
the project epoch. When the master triggers a task and writes it to the job 
lock table, it updates the task with its own IP. If the master node executes 
the job, no issue occurs. However, if a slave node executes the task, Spark 
calls the KE API to update the stage state during task execution. This update 
is routed to the master node based on the epoch, and the master node then 
updates its own IP in the task metadata when it updates the task output.
h1. Dev Design


When any node updates the task output, it should only update its IP in the task 
metadata if the task is being executed on the current node. During task 
execution, the KE API called by Spark should not need to be routed based on the 
epoch.

> Fix job scheduler related problems
> ----------------------------------
>
>                 Key: KYLIN-5857
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5857
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: 5.0.0
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to