[
https://issues.apache.org/jira/browse/KYLIN-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yoonsung.lee reassigned KYLIN-4847:
-----------------------------------
Assignee: yoonsung.lee
> Cuboid to HFile step failed on multiple job server env because of trying to
> read the metric jar file from the inactive job server's location.
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-4847
> URL: https://issues.apache.org/jira/browse/KYLIN-4847
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: v3.1.0
> Reporter: yoonsung.lee
> Assignee: yoonsung.lee
> Priority: Major
>
> h1. My Cluster Setting
> 1. versIon: 3.1.0
> 2. 2 job servers(job & query mode), 2 query only servers. Each of them runs
> on each different host machine.
> 3. Use spark engine to build job.
> h1. Problem Circumstance
> h2. Root cause
> The active job server submits spark job to execute `Convert Cuboid Data to
> HFile`. But the active job server get an error because a resource for
> submitting spark job has the wrong path which the active job server cannot
> read.
> * wrong resource:
> ${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/lib/metrics-core-2.2.0.jar
> * The ${KYLIN_HOME} is the inactive job server's location for only the above
> jar file.
> This situation occurs in the following two circumstances.
> h2. On build cube
> 1. Request the build API to the inactive job server. (exactly:
> /kylin/api/cubes/${cube_name}/rebuild )
> 2. Inactive job server stores the build task in meta store.
> 3. Active job server takes the build task and proceeds it.
> 4. Active job server failed on the `Convert Cuboid Data to HFile` step.
> **This doesn't occur when I request build API to the active job server.**
> h2. On merge
> 1. Trigger merge cube job periodically
> 2. Active job server takes the merge task and proceeds it.
> 3. Active job server failed on the `Convert Cuboid Data to HFile` step.
> **This doesn't occur when there is only one job server in the cluster.**
> h1. Progress to solve this.
> I'm trying to find which code set the metrics-core-2.2.0.jar path wrong.
> Until now, I guess this code would be the set the metrics-core-2.2.0.jar for
> the `Cuboid to HFile` spark job.
> *
> [https://github.com/apache/kylin/blob/kylin-3.1.0/storage-hbase/src/main/java/org/apache/kylin/storage/hbase/steps/HBaseSparkSteps.java#L69]
> h1. Questions
> 1. I'm trying to remote debug with IDE to make sure my guess is right. But
> the breakpoint on that line is not captured on Runtime. It seems to be called
> on the booting phase. Is it right?
> 2. Is there any hint or guessing to solve this issue regardless of the above
> my progress?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)