[ 
https://issues.apache.org/jira/browse/KYLIN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205839#comment-15205839
 ] 

Richard Calaba edited comment on KYLIN-1319 at 4/28/16 6:13 PM:
----------------------------------------------------------------

Hello,

in our case - we use MapR distro with multiple nodes. We have this service 
running on 2 nodes - to eliminate SPOF and provide "HA".
The service is active only on one node at given time. In case the service fails 
(or node is shut down) the 2nd node takes over and activates the service.

The problem is that we have to hard-code the property 
kylin.job.yarn.app.rest.check.status.url to one of the 2 nodes running this 
service. If that one fails - Kylin stops building cubes (after Step 2) - even 
if the MR job is SUCCESS. The reason is that the Web Service specified in the 
yarn.resourcemanager.webapp.address cannot be reached.

As temporary workaround - can be the property 
yarn.resourcemanager.webapp.address enhanced to provide list of multiple web 
services instead of single service url ???

In advance -> the Job Engine doesn't report any details why the execution of 
the Build Process stopped - at least some additional indication .like "Cannot 
Check MR Job Status" would be nice ...



was (Author: [email protected]):
Hello,

in our case - we use MapR distro with multiple nodes. We have this service 
running on 2 nodes - to eliminate SPOF and provide "HA".
The service is active only on one node at given time. In case the service fails 
(or node is shut down) the 2nd node takes over and activates the service.

The problem is that we have to hard-code the property 
yarn.resourcemanager.webapp.address to one of the 2 nodes running this service. 
If that one fails - Kylin stops building cubes (after Step 2) - even if the MR 
job is SUCCESS. The reason is that the Web Service specified in the 
yarn.resourcemanager.webapp.address cannot be reached.

As temporary workaround - can be the property 
yarn.resourcemanager.webapp.address enhanced to provide list of multiple web 
services instead of single service url ???

In advance -> the Job Engine doesn't report any details why the execution of 
the Build Process stopped - at least some additional indication .like "Cannot 
Check MR Job Status" would be nice ...


> Find a better way to check hadoop job status
> --------------------------------------------
>
>                 Key: KYLIN-1319
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1319
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: liyang
>            Assignee: Zhong Yanghong
>              Labels: newbie
>
> Currently Kylin retrieves jobs status via a resource manager web service like 
> {code}https://<your_rm_server>:<port>/ws/v1/cluster/apps/${job_id}?anonymous=true{code}
> It is not most robust. Some user does not have 
> "yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status 
> will fail out-of-box. They have to set a Kylin property 
> "kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user 
> friendly.
> Kerberos authentication might cause problem too if security is enabled.
> Is there a more robust way to check job status? Via Job API?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to