[ https://issues.apache.org/jira/browse/FLINK-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875825#comment-15875825 ]
ASF GitHub Bot commented on FLINK-5487: --------------------------------------- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3358#discussion_r102186137 --- Diff: docs/dev/connectors/elasticsearch.md --- @@ -272,9 +308,115 @@ input.addSink(new ElasticsearchSink(config, new ElasticsearchSinkFunction[String The difference is that now we do not need to provide a list of addresses of Elasticsearch nodes. +### Handling Failing Elasticsearch Requests + +Elasticsearch action requests may fail due to a variety of reasons, including +temporarily saturated node queue capacity or malformed documents to be indexed. +The Flink Elasticsearch Sink allows the user to specify how request +failures are handled, by simply implementing an `ActionRequestFailureHandler` and +providing it to the constructor. + +Below is an example: + +<div class="codetabs" markdown="1"> +<div data-lang="java" markdown="1"> +{% highlight java %} +DataStream<String> input = ...; + +input.addSink(new ElasticsearchSink<>( + config, transportAddresses, + new ElasticsearchSinkFunction<String>() {...}, + new ActionRequestFailureHandler() { + @Override + boolean onFailure(ActionRequest action, Throwable failure, RequestIndexer indexer) { + // this example uses Apache Commons to search for nested exceptions + + if (ExceptionUtils.indexOfThrowable(failure, EsRejectedExecutionException.class) >= 0) { + // full queue; re-add document for indexing + indexer.add(action); + return false; + } else if (ExceptionUtils.indexOfThrowable(failure, ElasticsearchParseException.class) >= 0) { + // malformed document; simply drop request without failing sink + return false; + } else { + // for all other failures, fail the sink + return true; + } + } +})); +{% endhighlight %} +</div> +<div data-lang="scala" markdown="1"> +{% highlight scala %} +val input: DataStream[String] = ... + +input.addSink(new ElasticsearchSink( + config, transportAddresses, + new ElasticsearchSinkFunction[String] {...}, + new ActionRequestFailureHandler { + override def onFailure(ActionRequest action, Throwable failure, RequestIndexer indexer) { + // this example uses Apache Commons to search for nested exceptions + + if (ExceptionUtils.indexOfThrowable(failure, EsRejectedExecutionException.class) >= 0) { + // full queue; re-add document for indexing + indexer.add(action) + return false + } else if (ExceptionUtils.indexOfThrowable(failure, ElasticsearchParseException.class) { + // malformed document; simply drop request without failing sink + return false + } else { + // for all other failures, fail the sink + return true + } + } +})) +{% endhighlight %} +</div> +</div> + +The above example will let the sink re-add requests that failed due to +queue capacity saturation and drop requests with malformed documents, without +failing the sink. For all other failures, the sink will fail. If a `ActionRequestFailureHandler` +is not provided to the constructor, the sink will fail for any kind of error. + +Note that `onFailure` is called for failures that still occur only after the +`BulkProcessor` internally finishes all backoff retry attempts. +By default, the `BulkProcessor` retries to a maximum of 8 attempts with +an exponential backoff. For more information on the behaviour of the +internal `BulkProcessor` and how to configure it, please see the following section. + +<p style="border-radius: 5px; padding: 5px" class="bg-danger"> +<b>IMPORTANT</b>: Re-adding requests back to the internal <b>BulkProcessor</b> +on failures will lead to longer checkpoints, as the sink will also +need to wait for the re-added requests to be flushed when checkpointing. +This also means that if re-added requests never succeed, the checkpoint will +never finish. +</p> --- End diff -- I wouldn't suggest adding a `ActionRequestFailureHandler` that out-of-the-box retries for all exceptions, though. That could let users easily overlook some exceptions that simply cannot be retried without custom logic (for example, malformed documents with wrong field types). > Proper at-least-once support for ElasticsearchSink > -------------------------------------------------- > > Key: FLINK-5487 > URL: https://issues.apache.org/jira/browse/FLINK-5487 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors > Reporter: Tzu-Li (Gordon) Tai > Assignee: Tzu-Li (Gordon) Tai > Priority: Critical > > Discussion in ML: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fault-tolerance-guarantees-of-Elasticsearch-sink-in-flink-elasticsearch2-td10982.html > Currently, the Elasticsearch Sink actually doesn't offer any guarantees for > message delivery. > For proper support of at-least-once, the sink will need to participate in > Flink's checkpointing: when snapshotting is triggered at the > {{ElasticsearchSink}}, we need to synchronize on the pending ES requests by > flushing the internal bulk processor. For temporary ES failures (see > FLINK-5122) that may happen on the flush, we should retry them before > returning from snapshotting and acking the checkpoint. If there are > non-temporary ES failures on the flush, the current snapshot should fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)