pnowojski closed pull request #7097: [FLINK-10874][kafka-docs] Document likely
cause of UnknownTopicOrPartitionException
URL: https://github.com/apache/flink/pull/7097
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/docs/dev/connectors/kafka.md b/docs/dev/connectors/kafka.md
index 0630c6ec7d6..351a4dc2d41 100644
--- a/docs/dev/connectors/kafka.md
+++ b/docs/dev/connectors/kafka.md
@@ -660,19 +660,6 @@ we recommend setting the number of retries to a higher
value.
**Note**: There is currently no transactional producer for Kafka, so Flink can
not guarantee exactly-once delivery
into a Kafka topic.
-<div class="alert alert-warning">
- <strong>Attention:</strong> Depending on your Kafka configuration, even
after Kafka acknowledges
- writes you can still experience data loss. In particular keep in mind the
following Kafka settings:
- <ul>
- <li><tt>acks</tt></li>
- <li><tt>log.flush.interval.messages</tt></li>
- <li><tt>log.flush.interval.ms</tt></li>
- <li><tt>log.flush.*</tt></li>
- </ul>
- Default values for the above options can easily lead to data loss. Please
refer to Kafka documentation
- for more explanation.
-</div>
-
#### Kafka 0.11 and newer
With Flink's checkpointing enabled, the `FlinkKafkaProducer011`
(`FlinkKafkaProducer` for Kafka >= 1.0.0 versions) can provide
@@ -690,21 +677,6 @@ chosen by passing appropriate `semantic` parameter to the
`FlinkKafkaProducer011
or `read_uncommitted` - the latter one is the default value) for any
application consuming records
from Kafka.
-<div class="alert alert-warning">
- <strong>Attention:</strong> Depending on your Kafka configuration, even
after Kafka acknowledges
- writes you can still experience data losses. In particular keep in mind
about following properties
- in Kafka config:
- <ul>
- <li><tt>acks</tt></li>
- <li><tt>log.flush.interval.messages</tt></li>
- <li><tt>log.flush.interval.ms</tt></li>
- <li><tt>log.flush.*</tt></li>
- </ul>
- Default values for the above options can easily lead to data loss. Please
refer to the Kafka documentation
- for more explanation.
-</div>
-
-
##### Caveats
`Semantic.EXACTLY_ONCE` mode relies on the ability to commit transactions
@@ -831,4 +803,38 @@ A mismatch in service name between client and server
configuration will cause th
For more information on Flink configuration for Kerberos security, please see
[here]({{ site.baseurl}}/ops/config.html).
You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further
details on how Flink internally setups Kerberos-based security.
+## Troubleshooting
+
+<div class="alert alert-warning">
+If you have a problem with Kafka when using Flink, keep in mind that Flink
only wraps
+<a
href="https://kafka.apache.org/documentation/#consumerapi">KafkaConsumer</a> or
+<a href="https://kafka.apache.org/documentation/#producerapi">KafkaProducer</a>
+and your problem might be independent of Flink and sometimes can be solved by
upgrading Kafka brokers,
+reconfiguring Kafka brokers or reconfiguring <tt>KafkaConsumer</tt> or
<tt>KafkaProducer</tt> in Flink.
+Some examples of common problems are listed below.
+</div>
+
+### Data loss
+
+Depending on your Kafka configuration, even after Kafka acknowledges
+writes you can still experience data loss. In particular keep in mind about
the following properties
+in Kafka config:
+
+- `acks`
+- `log.flush.interval.messages`
+- `log.flush.interval.ms`
+- `log.flush.*`
+
+Default values for the above options can easily lead to data loss.
+Please refer to the Kafka documentation for more explanation.
+
+### UnknownTopicOrPartitionException
+
+One possible cause of this error is when a new leader election is taking place,
+for example after or during restarting a Kafka broker.
+This is a retriable exception, so Flink job should be able to restart and
resume normal operation.
+It also can be circumvented by changing `retries` property in the producer
settings.
+However this might cause reordering of messages,
+which in turn if undesired can be circumvented by setting
`max.in.flight.requests.per.connection` to 1.
+
{% top %}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services