[ 
https://issues.apache.org/jira/browse/OOZIE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892509#comment-13892509
 ] 

Ryota Egashira commented on OOZIE-1678:
---------------------------------------

patch is available on RB. 
- Current design
SLAService(SLACalculatorMemory) has memory structure (slaMap) to keep track of 
SLA status of job/action.   When oozie server starts up,  it loads from 
DB(SLASummary and SLARegistration tables) and populates slaMap, then validates 
each SLA status, updates DB.   Afterwards,  whenever job status change occurs,  
slaMap is updated.   SLAService also regularly iterate through slaMap.

- HA approach
initial loading at start-up will be distributed among multiple oozie instances 
using ZKJobsConcurrencyService. issue is that slaMap of each oozie instance 
becomes a partial set.  No guarantee that slaMap of a certain oozie contains 
entry for job running on the instance. (mod split of ZKJobsConcurrencyService 
is nothing to do with where job runs, could result in cache miss). in such 
case, slaMap adds new entry for the job, (involve extra DB calls) upon 
receiving job status change. 
since a job could run across multiple instances,  slaMap could be out of sync, 
might include obsolete entry of job not running on the instance any more. 
regular iteration could still pick up the obsolete entry, but always validate 
against DB, so consistency is kept (although inefficient).   to keep slaMap 
synced, oozie intance can broadcast the update to other instances, but i 
thought it's a bit overkill, thus not implemented. 



> HA support for SLA
> ------------------
>
>                 Key: OOZIE-1678
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1678
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: HA
>            Reporter: Ryota Egashira
>            Assignee: Ryota Egashira
>
> SLA Service needs to be changed to perform SLA calculation on multiple oozie 
> servers in HA setting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to