[
https://issues.apache.org/jira/browse/GOBBLIN-273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhixiong Chen updated GOBBLIN-273:
----------------------------------
Component/s: gobblin-core
> Add failure monitoring
> ----------------------
>
> Key: GOBBLIN-273
> URL: https://issues.apache.org/jira/browse/GOBBLIN-273
> Project: Apache Gobblin
> Issue Type: Task
> Components: gobblin-core
> Reporter: Zhixiong Chen
> Assignee: Zhixiong Chen
>
> When a job failed with a very long log, it's not easy to dive into the log
> and find the reason of the failure. Here a reporter is plugin-ed into the
> Gobblin Metrics architecture to collect job failure events into a file. A job
> now has task level and dataset level failure events reported for free.
> h3. `MetricContext#submitFailureEvent`
> When a failure event needs to be reported, it should be submitted with this
> method, which encapsulates the event into a `FailureEventNotification`
> h3. `FileFailureEventReporter`
> Report all failure events into a file. Each job has its own report folder.
> h3. Configurations
> To enable job failure reporting, the following configurations are required
> {code:java}
> // Some comments here
> metrics.enabled=true
> fs.uri=<file system uri> // by default, local file system is used
> failure.log.dir=<root folder of all jobs failure reports>
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)