[
https://issues.apache.org/jira/browse/KYLIN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205839#comment-15205839
]
Richard Calaba edited comment on KYLIN-1319 at 4/28/16 6:13 PM:
----------------------------------------------------------------
Hello,
in our case - we use MapR distro with multiple nodes. We have this service
running on 2 nodes - to eliminate SPOF and provide "HA".
The service is active only on one node at given time. In case the service fails
(or node is shut down) the 2nd node takes over and activates the service.
The problem is that we have to hard-code the property
kylin.job.yarn.app.rest.check.status.url to one of the 2 nodes running this
service. If that one fails - Kylin stops building cubes (after Step 2) - even
if the MR job is SUCCESS. The reason is that the Web Service specified in the
yarn.resourcemanager.webapp.address cannot be reached.
As temporary workaround - can be the property
yarn.resourcemanager.webapp.address enhanced to provide list of multiple web
services instead of single service url ???
In advance -> the Job Engine doesn't report any details why the execution of
the Build Process stopped - at least some additional indication .like "Cannot
Check MR Job Status" would be nice ...
was (Author: [email protected]):
Hello,
in our case - we use MapR distro with multiple nodes. We have this service
running on 2 nodes - to eliminate SPOF and provide "HA".
The service is active only on one node at given time. In case the service fails
(or node is shut down) the 2nd node takes over and activates the service.
The problem is that we have to hard-code the property
yarn.resourcemanager.webapp.address to one of the 2 nodes running this service.
If that one fails - Kylin stops building cubes (after Step 2) - even if the MR
job is SUCCESS. The reason is that the Web Service specified in the
yarn.resourcemanager.webapp.address cannot be reached.
As temporary workaround - can be the property
yarn.resourcemanager.webapp.address enhanced to provide list of multiple web
services instead of single service url ???
In advance -> the Job Engine doesn't report any details why the execution of
the Build Process stopped - at least some additional indication .like "Cannot
Check MR Job Status" would be nice ...
> Find a better way to check hadoop job status
> --------------------------------------------
>
> Key: KYLIN-1319
> URL: https://issues.apache.org/jira/browse/KYLIN-1319
> Project: Kylin
> Issue Type: Improvement
> Reporter: liyang
> Assignee: Zhong Yanghong
> Labels: newbie
>
> Currently Kylin retrieves jobs status via a resource manager web service like
> {code}https://<your_rm_server>:<port>/ws/v1/cluster/apps/${job_id}?anonymous=true{code}
> It is not most robust. Some user does not have
> "yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status
> will fail out-of-box. They have to set a Kylin property
> "kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user
> friendly.
> Kerberos authentication might cause problem too if security is enabled.
> Is there a more robust way to check job status? Via Job API?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)