[
https://issues.apache.org/jira/browse/KYLIN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865843#comment-17865843
]
pengfei.zhan commented on KYLIN-5857:
-------------------------------------
h1. Root Cause
Inductotherm job nodes have two roles: master and slave. The master node holds
the project epoch. When the master triggers a task and writes it to the job
lock table, it updates the task with its own IP. If the master node executes
the job, no issue occurs. However, if a slave node executes the task, Spark
calls the KE API to update the stage state during task execution. This update
is routed to the master node based on the epoch, and the master node then
updates its own IP in the task metadata when it updates the task output.
h1. Dev Design
When any node updates the task output, it should only update its IP in the task
metadata if the task is being executed on the current node. During task
execution, the KE API called by Spark should not need to be routed based on the
epoch.
> Fix job scheduler related problems
> ----------------------------------
>
> Key: KYLIN-5857
> URL: https://issues.apache.org/jira/browse/KYLIN-5857
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: 5.0.0
> Reporter: pengfei.zhan
> Assignee: pengfei.zhan
> Priority: Major
> Fix For: 5.0.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)