[
https://issues.apache.org/jira/browse/ATLAS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carol Drummond updated ATLAS-3168:
----------------------------------
Labels: no-release-note (was: )
> PatchFx: Support for HA Mode
> ----------------------------
>
> Key: ATLAS-3168
> URL: https://issues.apache.org/jira/browse/ATLAS-3168
> Project: Atlas
> Issue Type: Bug
> Components: atlas-core
> Affects Versions: 2.0.0, trunk
> Reporter: Ashutosh Mestry
> Assignee: Ashutosh Mestry
> Priority: Major
> Labels: no-release-note
> Fix For: 2.0.0, trunk
>
> Attachments: ATLAS-3168-PatchFx-Fix-for-Startup-in-HA-mode.patch,
> ATLAS-3168-PatchFx-Unit-test-fixes-and-optimization.patch
>
>
> *Description*
> PatchFx in HA mode causes exceptions.
> *Steps to Duplicate*
> Deploy latest version of Atlas on a cluster with HA deployment.
> Following error appears during startup:
> {code:java}
> 2019-04-23 03:54:22,280 ERROR - [main-EventThread:] ~ Got exception while
> activating (ActiveInstanceElectorService:160)
> java.lang.NullPointerException
> at
> org.apache.atlas.repository.audit.HBaseBasedAuditRepository.createTableIfNotExists(HBaseBasedAuditRepository.java:521)
> at
> org.apache.atlas.repository.audit.HBaseBasedAuditRepository.instanceIsActive(HBaseBasedAuditRepository.java:627)
> at
> org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:154)
> at
> org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665)
> at
> org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661)
> at
> org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
> at
> org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435)
> at
> org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
> at
> org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:660)
> at
> org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:539)
> at
> org.apache.curator.framework.recipes.leader.LeaderLatch.access$700(LeaderLatch.java:65)
> at
> org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:590)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:865)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:635)
> at
> org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> 2019-04-23 03:54:22,280 WARN - [main-EventThread:] ~ Server instance with
> server id id2 is removed as leader (ActiveInstanceElectorService:197)
> {code}
> *Root Cause*
> Pattern followed within Atlas:
> * _Service.start_ is called when _Services_ is initialized.
> * For every service:
> ** Atlas is not in HA mode: Start and perform startup specific actions.
> ** Atlas is in HA mode: Start and wait for _instanceIsActive_ to be called.
> * _AtlasPatchService_ did not implement _ActiveStateChangeHandler_.
> * _AtlasPatchService_ was not registered with
> _ActiveStateChangeHandler.HandlerOrder_.
> This cause _AtlasPatchService.start_ to perform its job of patching the
> database. This happened without _AtlasTypeDefStoreInitializer_ initialized.
> This cause exceptions. _ActiveInstanceElectoral_ service got callback from ZK
> asking it to call the _instanceIsActive_ method on _HBaseRepositoryService_,
> which had not been started. This caused the exception to show the stack trace.
> *Solution*
> Modify _AtlasPatchService_ to follow the pattern used for other services.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)