[
https://issues.apache.org/jira/browse/OOZIE-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590907#comment-13590907
]
Rohini Palaniswamy commented on OOZIE-1244:
-------------------------------------------
Would like to put forth two possible approaches for discussion. The first
approach invovles making SLA support as part of core oozie. The second one
involves having it as a separate war file for isolation but can also be hosted
in the same tomcat as oozie. Both approaches proposed involve
* building on top of the event system work being done in OOZIE-1209.
* Reuse the current sla_events table which has sla registration events and
job status events.
* Have a SLACalculationService which will get sla registration events for
submitted workflows or materialized co-ordinator actions. The registered
information will be kept in memory (with intelligent overflow to disk based on
expected start and end times), and periodically (lets say every 1 min or 2 min)
checked for SLA misses. The elements will be removed from memory once all the
possible SLA events are generated. Apart from periodic check, the job status
events will also be used to do the calculation as they are received. The
generated SLA events will be sent to the EventHandlerService, which will
forward to the registered SLAEventListeners.
* Three implementations of SLAEventListeners - one to send JMS notifications,
one to do email alerting and one to write the SLA events to a database. Lets
call this sla_info table to avoid confusion with the existing sla_events table.
The SLA information from database can be used for querying and for building
dashboard.
* Rest APIs to query the SLA information table.
* A simple dashboard to view and filter historical SLA information.
Design 1 (Core Oozie):
* SLACalculationService if in core oozie will implement
WorkflowEventListener and CoordinatorEventListener and get materialization
events and job status events through that. For recovery during restart, it will
query the SLA_EVENTS table to get the registered events.
Pros:
* Part of core oozie. Easy to configure and setup.
* Acting on job events will make processing faster.
* Framework for JMS, db access can be reused.
Cons:
* SLA processing will consume CPU cycles affecting core functionality.
Design 2 (Separate Service):
* The service will reuse the oozie-core framework - EventHandlerService,
JMSAccessorService (jms notifications), JPAService(sla_info to database), etc.
* SLACalculationService here will rely mainly on fetching registration and
status events in a regular interval from SLA_EVENTS table (bulk fetch on
sequenceid) of Oozie through REST APIs.
Pros:
* Isolation from core. Ability to host as a separate service if needed for
performance.
* Ability to consolidate SLA information of multiple clusters in future.
Just a thought. Might not be doing it.
Cons:
* More load on Oozie core DB due the frequent DB calls even if we are
fetching based on sequence id.
* Possible slight delay in SLA calculations as we do bulk fetch in periodic
interval instead of acting immediately on events.
* Dev work to create separate service.
* Adds complexity to setup and deployment
One option would be do have the code in a separate module, but have profiles
to either build it as part of core oozie war or as a separate war making it a
choice of deployment.
> SLA Support in Oozie
> --------------------
>
> Key: OOZIE-1244
> URL: https://issues.apache.org/jira/browse/OOZIE-1244
> Project: Oozie
> Issue Type: New Feature
> Reporter: Rohini Palaniswamy
>
> Would like to have the following features in Oozie
> - JMS notifications on SLA met, SLA start miss, SLA end miss and SLA
> duration miss
> - Email alerting for SLA start miss, SLA end miss and SLA duration miss
> - API to query SLA met/miss information. Currently the SLA information that
> can be queried is only SLA registration event and job status events. One has
> to calculate the actual misses from those.
> - A simple dashboard to view and query the SLA met/miss information built on
> the API mentioned above.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira