[
https://issues.apache.org/jira/browse/OOZIE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892509#comment-13892509
]
Ryota Egashira commented on OOZIE-1678:
---------------------------------------
patch is available on RB.
- Current design
SLAService(SLACalculatorMemory) has memory structure (slaMap) to keep track of
SLA status of job/action. When oozie server starts up, it loads from
DB(SLASummary and SLARegistration tables) and populates slaMap, then validates
each SLA status, updates DB. Afterwards, whenever job status change occurs,
slaMap is updated. SLAService also regularly iterate through slaMap.
- HA approach
initial loading at start-up will be distributed among multiple oozie instances
using ZKJobsConcurrencyService. issue is that slaMap of each oozie instance
becomes a partial set. No guarantee that slaMap of a certain oozie contains
entry for job running on the instance. (mod split of ZKJobsConcurrencyService
is nothing to do with where job runs, could result in cache miss). in such
case, slaMap adds new entry for the job, (involve extra DB calls) upon
receiving job status change.
since a job could run across multiple instances, slaMap could be out of sync,
might include obsolete entry of job not running on the instance any more.
regular iteration could still pick up the obsolete entry, but always validate
against DB, so consistency is kept (although inefficient). to keep slaMap
synced, oozie intance can broadcast the update to other instances, but i
thought it's a bit overkill, thus not implemented.
> HA support for SLA
> ------------------
>
> Key: OOZIE-1678
> URL: https://issues.apache.org/jira/browse/OOZIE-1678
> Project: Oozie
> Issue Type: Sub-task
> Components: HA
> Reporter: Ryota Egashira
> Assignee: Ryota Egashira
>
> SLA Service needs to be changed to perform SLA calculation on multiple oozie
> servers in HA setting.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)