dmvk commented on code in PR #22468:
URL: https://github.com/apache/flink/pull/22468#discussion_r1183548843


##########
docs/content/docs/deployment/advanced/failure_enrichers.md:
##########
@@ -0,0 +1,53 @@
+---
+title: "Pluggable Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink enables users to extend the default failure handling behavior using the 
plugin framework.
+
+## Custom failure enrichers
+Flink provides a pluggable failure enricher interface for users to register 
their custom logic.
+The goal is to give flexibility to developers who can now implement their own 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+Failure enrichers are triggered every time an exception is reported at runtime 
by the job manager.
+Every failure enricher may optionally return labels (kv string pairs) 
associated with the failure that are then exposed via the job manager's Rest 
interface (e.g., a 'System' tag implying the failure considered as a system 
error).
+For instance, when a Flink runtime failure occurs caused by network error, we 
can increment the appropriate counter.
+With accurate metrics, we now have better insight of platform level metrics 
e.g., network failures, platform reliability, etc.
+The default CountingFailureEnricher just records the failure count and then 
emits the metric "numJobFailure" for the job.
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom plugin for your use-case:
+
+- Add your own FailureEnricher by implementing the 
`org.apache.flink.core.failure.FailureEnricher` interface.
+
+- Add your own FailureEnricherFactory by implementing the 
`org.apache.flink.core.failure.FailureEnricherFactory` interface.

Review Comment:
   It would be neat to add links into javadocs.



##########
docs/content/docs/deployment/advanced/failure_enrichers.md:
##########
@@ -0,0 +1,53 @@
+---
+title: "Pluggable Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink enables users to extend the default failure handling behavior using the 
plugin framework.
+
+## Custom failure enrichers
+Flink provides a pluggable failure enricher interface for users to register 
their custom logic.
+The goal is to give flexibility to developers who can now implement their own 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+Failure enrichers are triggered every time an exception is reported at runtime 
by the job manager.
+Every failure enricher may optionally return labels (kv string pairs) 
associated with the failure that are then exposed via the job manager's Rest 
interface (e.g., a 'System' tag implying the failure considered as a system 
error).
+For instance, when a Flink runtime failure occurs caused by network error, we 
can increment the appropriate counter.
+With accurate metrics, we now have better insight of platform level metrics 
e.g., network failures, platform reliability, etc.
+The default CountingFailureEnricher just records the failure count and then 
emits the metric "numJobFailure" for the job.
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom plugin for your use-case:
+
+- Add your own FailureEnricher by implementing the 
`org.apache.flink.core.failure.FailureEnricher` interface.
+
+- Add your own FailureEnricherFactory by implementing the 
`org.apache.flink.core.failure.FailureEnricherFactory` interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory`
+  which contains the class name of your exception classifier factory class 
(see the [Java Service 
Loader](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html) 
docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "exception-classification", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+As a plugin will be loaded in different classloader, log4j is not able to 
initialized correctly in your plugin. In this case,
+you should add the config below in `flink-conf.yaml`:
+- `plugin.classloader.parent-first-patterns.additional: org.slf4j`

Review Comment:
   Also 
`org.apache.flink.configuration.CoreOptions#PARENT_FIRST_LOGGING_PATTERNS` 
already contains it



##########
docs/content/docs/deployment/advanced/failure_enrichers.md:
##########
@@ -0,0 +1,53 @@
+---
+title: "Pluggable Failure Enrichers"
+nav-title: failure-enrichers
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink enables users to extend the default failure handling behavior using the 
plugin framework.
+
+## Custom failure enrichers
+Flink provides a pluggable failure enricher interface for users to register 
their custom logic.
+The goal is to give flexibility to developers who can now implement their own 
plugins to categorize job failures, expose custom metrics, make calls to 
external notification systems, and more.
+Failure enrichers are triggered every time an exception is reported at runtime 
by the job manager.
+Every failure enricher may optionally return labels (kv string pairs) 
associated with the failure that are then exposed via the job manager's Rest 
interface (e.g., a 'System' tag implying the failure considered as a system 
error).
+For instance, when a Flink runtime failure occurs caused by network error, we 
can increment the appropriate counter.
+With accurate metrics, we now have better insight of platform level metrics 
e.g., network failures, platform reliability, etc.
+The default CountingFailureEnricher just records the failure count and then 
emits the metric "numJobFailure" for the job.
+
+
+### Implement a plugin for your custom enricher
+
+To implement a custom plugin for your use-case:
+
+- Add your own FailureEnricher by implementing the 
`org.apache.flink.core.failure.FailureEnricher` interface.
+
+- Add your own FailureEnricherFactory by implementing the 
`org.apache.flink.core.failure.FailureEnricherFactory` interface.
+
+- Add a service entry. Create a file 
`META-INF/services/org.apache.flink.core.failure.FailureEnricherFactory`
+  which contains the class name of your exception classifier factory class 
(see the [Java Service 
Loader](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html) 
docs for more details).
+
+
+Then, create a jar which includes your `FailureEnricher`, 
`FailureEnricherFactory`, `META-INF/services/` and all the external 
dependencies.
+Make a directory in `plugins/` of your Flink distribution with an arbitrary 
name, e.g. "exception-classification", and put the jar into this directory.
+See [Flink Plugin]({% link deployment/filesystems/plugins.md %}) for more 
details.
+
+As a plugin will be loaded in different classloader, log4j is not able to 
initialized correctly in your plugin. In this case,
+you should add the config below in `flink-conf.yaml`:
+- `plugin.classloader.parent-first-patterns.additional: org.slf4j`

Review Comment:
   This feels weird, you shouldn't bundle slf4j in the plugin



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to