pnowojski closed pull request #7097: [FLINK-10874][kafka-docs] Document likely 
cause of UnknownTopicOrPartitionException
URL: https://github.com/apache/flink/pull/7097
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/dev/connectors/kafka.md b/docs/dev/connectors/kafka.md
index 0630c6ec7d6..351a4dc2d41 100644
--- a/docs/dev/connectors/kafka.md
+++ b/docs/dev/connectors/kafka.md
@@ -660,19 +660,6 @@ we recommend setting the number of retries to a higher 
value.
 **Note**: There is currently no transactional producer for Kafka, so Flink can 
not guarantee exactly-once delivery
 into a Kafka topic.
 
-<div class="alert alert-warning">
-  <strong>Attention:</strong> Depending on your Kafka configuration, even 
after Kafka acknowledges
-  writes you can still experience data loss. In particular keep in mind the 
following Kafka settings:
-  <ul>
-    <li><tt>acks</tt></li>
-    <li><tt>log.flush.interval.messages</tt></li>
-    <li><tt>log.flush.interval.ms</tt></li>
-    <li><tt>log.flush.*</tt></li>
-  </ul>
-  Default values for the above options can easily lead to data loss. Please 
refer to Kafka documentation
-  for more explanation.
-</div>
-
 #### Kafka 0.11 and newer
 
 With Flink's checkpointing enabled, the `FlinkKafkaProducer011` 
(`FlinkKafkaProducer` for Kafka >= 1.0.0 versions) can provide
@@ -690,21 +677,6 @@ chosen by passing appropriate `semantic` parameter to the 
`FlinkKafkaProducer011
  or `read_uncommitted` - the latter one is the default value) for any 
application consuming records
  from Kafka.
 
-<div class="alert alert-warning">
-  <strong>Attention:</strong> Depending on your Kafka configuration, even 
after Kafka acknowledges
-  writes you can still experience data losses. In particular keep in mind 
about following properties
-  in Kafka config:
-  <ul>
-    <li><tt>acks</tt></li>
-    <li><tt>log.flush.interval.messages</tt></li>
-    <li><tt>log.flush.interval.ms</tt></li>
-    <li><tt>log.flush.*</tt></li>
-  </ul>
-  Default values for the above options can easily lead to data loss. Please 
refer to the Kafka documentation
-  for more explanation.
-</div>
-
-
 ##### Caveats
 
 `Semantic.EXACTLY_ONCE` mode relies on the ability to commit transactions
@@ -831,4 +803,38 @@ A mismatch in service name between client and server 
configuration will cause th
 For more information on Flink configuration for Kerberos security, please see 
[here]({{ site.baseurl}}/ops/config.html).
 You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further 
details on how Flink internally setups Kerberos-based security.
 
+## Troubleshooting
+
+<div class="alert alert-warning">
+If you have a problem with Kafka when using Flink, keep in mind that Flink 
only wraps
+<a 
href="https://kafka.apache.org/documentation/#consumerapi";>KafkaConsumer</a> or
+<a href="https://kafka.apache.org/documentation/#producerapi";>KafkaProducer</a>
+and your problem might be independent of Flink and sometimes can be solved by 
upgrading Kafka brokers,
+reconfiguring Kafka brokers or reconfiguring <tt>KafkaConsumer</tt> or 
<tt>KafkaProducer</tt> in Flink.
+Some examples of common problems are listed below.
+</div>
+
+### Data loss
+
+Depending on your Kafka configuration, even after Kafka acknowledges
+writes you can still experience data loss. In particular keep in mind about 
the following properties
+in Kafka config:
+
+- `acks`
+- `log.flush.interval.messages`
+- `log.flush.interval.ms`
+- `log.flush.*`
+
+Default values for the above options can easily lead to data loss.
+Please refer to the Kafka documentation for more explanation.
+
+### UnknownTopicOrPartitionException
+
+One possible cause of this error is when a new leader election is taking place,
+for example after or during restarting a Kafka broker.
+This is a retriable exception, so Flink job should be able to restart and 
resume normal operation.
+It also can be circumvented by changing `retries` property in the producer 
settings.
+However this might cause reordering of messages,
+which in turn if undesired can be circumvented by setting 
`max.in.flight.requests.per.connection` to 1.
+
 {% top %}


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to