[
https://issues.apache.org/jira/browse/OOZIE-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705689#comment-14705689
]
Purshotam Shah commented on OOZIE-2245:
---------------------------------------
Few minor comments.
1.We can add this to oozie health page too. If there is a schema error, oozie
health should indicate error.
Oozie health will be useful, where we can check status of each oozie component
like hdfs, sharelib, ZK, DB.
After upgrade/meiantiace admin/ops can use Oozie health to validate each
component.
Beside reporting status of each component, it also report time taken to
validate each component. That can be useful for finding slowness ( DB,ZK, HDFS
slowness).
This can be done once we checkin
https://issues.apache.org/jira/browse/OOZIE-2306.
2. One suggestion, can you add support of sending mail if there is schema
error. Otherwise admin/ops has to keep on checking instrumentation and then
log to find out issue.
> Service to periodically check database schema
> ---------------------------------------------
>
> Key: OOZIE-2245
> URL: https://issues.apache.org/jira/browse/OOZIE-2245
> Project: Oozie
> Issue Type: New Feature
> Components: core
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: OOZIE-2245.002.patch, OOZIE-2245.003.patch,
> OOZIE-2245.patch
>
>
> We've seen a number of issues related to the database schema being incorrect
> (more than you would think). It seems some users go and muck around in the
> Oozie database, adding/removing columns and indexes, changing the default
> value of columns, etc. The issues caused by this can be very difficult to
> track down because their cause is not obvious and we generally assume the
> database schema is correct. For example, we saw an issue where Oozie was
> taking a long time to create Coordinator actions, and it turned out that the
> cause was that some indexes were missing, which made the Purge queries slow,
> which slowed down the whole database whenever the PurgeService ran. Another
> example was that the pause time was automatically being set whenever a
> Coordinator job was submitted, because the default value for the column was
> incorrect.
> We should create a Service which periodically runs and checks that the schema
> is correct. It can output details about what's wrong to the log.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)