[ 
https://issues.apache.org/jira/browse/OOZIE-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705689#comment-14705689
 ] 

Purshotam Shah commented on OOZIE-2245:
---------------------------------------

Few minor comments. 

1.We can add this to oozie health page too. If there is a schema error, oozie 
health should indicate error.
Oozie health will be useful, where we can check status of each oozie component 
like hdfs, sharelib, ZK, DB.
After upgrade/meiantiace admin/ops can use Oozie health to validate each 
component.
Beside reporting status of each component, it also report time taken to 
validate each component. That can be useful for finding slowness ( DB,ZK, HDFS 
slowness).  
This can be done once we checkin 
https://issues.apache.org/jira/browse/OOZIE-2306.

2. One suggestion, can you add support of sending mail if there is schema 
error.  Otherwise admin/ops has to keep on checking instrumentation and then 
log to find out issue.

> Service to periodically check database schema
> ---------------------------------------------
>
>                 Key: OOZIE-2245
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2245
>             Project: Oozie
>          Issue Type: New Feature
>          Components: core
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: OOZIE-2245.002.patch, OOZIE-2245.003.patch, 
> OOZIE-2245.patch
>
>
> We've seen a number of issues related to the database schema being incorrect 
> (more than you would think).  It seems some users go and muck around in the 
> Oozie database, adding/removing columns and indexes, changing the default 
> value of columns, etc.  The issues caused by this can be very difficult to 
> track down because their cause is not obvious and we generally assume the 
> database schema is correct.  For example, we saw an issue where Oozie was 
> taking a long time to create Coordinator actions, and it turned out that the 
> cause was that some indexes were missing, which made the Purge queries slow, 
> which slowed down the whole database whenever the PurgeService ran.  Another 
> example was that the pause time was automatically being set whenever a 
> Coordinator job was submitted, because the default value for the column was 
> incorrect.
> We should create a Service which periodically runs and checks that the schema 
> is correct.  It can output details about what's wrong to the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to