[
https://issues.apache.org/jira/browse/TAJO-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541444#comment-14541444
]
ASF GitHub Bot commented on TAJO-1586:
--------------------------------------
GitHub user blrunner opened a pull request:
https://github.com/apache/tajo/pull/566
TAJO-1586: TajoMaster HA startup failure on Yarn.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/blrunner/tajo TAJO-1586
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/566.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #566
----
commit 40fdee40d812369a59b9c6f48e9ebd4c2de2eccd
Author: JaeHwa Jung <[email protected]>
Date: 2015-05-11T08:14:15Z
Rename active master file name
commit 34ece4ce7f560257ffcb53b36d1273fbb9b88342
Author: JaeHwa Jung <[email protected]>
Date: 2015-05-12T14:18:50Z
Add active lock file.
commit 8a6ca1b7fb51caa13e418618a3896cc83318ca02
Author: JaeHwa Jung <[email protected]>
Date: 2015-05-12T14:19:10Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
commit 2c498a1dae4a53e365af56110bd9c43c94e37225
Author: JaeHwa Jung <[email protected]>
Date: 2015-05-13T06:22:59Z
Refacor codes for TajoMaster HA
commit 0c47c86117730966c6e3c3d479b715840c039cc7
Author: JaeHwa Jung <[email protected]>
Date: 2015-05-13T06:23:22Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
----
> TajoMaster HA startup failure on Yarn.
> --------------------------------------
>
> Key: TAJO-1586
> URL: https://issues.apache.org/jira/browse/TAJO-1586
> Project: Tajo
> Issue Type: Bug
> Components: tajo master
> Affects Versions: 0.10.0
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Fix For: 0.11.0, 0.10.1
>
> Attachments: TAJO-1586.patch
>
>
> I tried to deploy Tajo on YARN with Slider. But I couldn't deploy Tajo
> because of TajoMaster HA failure. TajoWorker failed to load TajoMaster
> address as follows.
> {code:xml}
> 2015-04-28 04:52:22,266 INFO org.apache.hadoop.service.AbstractService:
> Service org.apache.tajo.worker.TajoWorker failed in state STARTED; cause:
> org.apache.tajo.service.ServiceTrackerException:
> org.apache.tajo.service.ServiceTrackerException: No active master entry
> org.apache.tajo.service.ServiceTrackerException:
> org.apache.tajo.service.ServiceTrackerException: No active master entry
> at
> org.apache.tajo.ha.HdfsServiceTracker.getAddressElements(HdfsServiceTracker.java:441)
> at
> org.apache.tajo.ha.HdfsServiceTracker.getUmbilicalAddress(HdfsServiceTracker.java:348)
> at org.apache.tajo.worker.TajoWorker.serviceStart(TajoWorker.java:318)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at org.apache.tajo.worker.TajoWorker.startWorker(TajoWorker.java:141)
> at org.apache.tajo.worker.TajoWorker.main(TajoWorker.java:627)
> Caused by: org.apache.tajo.service.ServiceTrackerException: No active master
> entry
> at
> org.apache.tajo.ha.HdfsServiceTracker.getAddressElements(HdfsServiceTracker.java:413)
> ... 5 more
> 2015-04-28 04:52:22,307 INFO org.apache.hadoop.service.AbstractService:
> Service WorkerHeartbeatService failed in state STOPPED; cause:
> java.lang.NullPointerException
> java.lang.NullPointerException
> at
> org.apache.tajo.worker.WorkerHeartbeatService$WorkerHeartbeatThread.access$000(WorkerHeartbeatService.java:101)
> at
> org.apache.tajo.worker.WorkerHeartbeatService.serviceStop(WorkerHeartbeatService.java:90)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at org.apache.tajo.worker.TajoWorker.serviceStop(TajoWorker.java:375)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at org.apache.tajo.worker.TajoWorker.startWorker(TajoWorker.java:141)
> at org.apache.tajo.worker.TajoWorker.main(TajoWorker.java:627){code}
> I think that the cause of this failure is time difference between TajoMaster
> and TajoWorker.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)