[ https://issues.apache.org/jira/browse/FLINK-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687774#comment-16687774 ]
ASF GitHub Bot commented on FLINK-10874: ---------------------------------------- pnowojski commented on a change in pull request #7097: [FLINK-10874][kafka-docs] Document likely cause of UnknownTopicOrPartitionException URL: https://github.com/apache/flink/pull/7097#discussion_r233788608 ########## File path: docs/dev/connectors/kafka.md ########## @@ -804,4 +776,36 @@ When using standalone Flink deployment, you can also use `SASL_SSL`; please see For more information on Flink configuration for Kerberos security, please see [here]({{ site.baseurl}}/ops/config.html). You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further details on how Flink internally setups Kerberos-based security. +## Troubleshooting + +<div class="alert alert-warning"> +If you have a problem with Kafka when using Flink, keep in mind that Flink only wraps <tt>KafkaConsumer</tt> or <tt>KafkaProducer</tt> +and your problem might be independent of Flink and sometimes can be solved by upgrading Kafka brokers, +reconfiguring Kafka brokers or reconfiguring <tt>KafkaConsumer</tt> or <tt>KafkaProducer</tt> in Flink. +Some examples of common problems are listed below. +</div> + +### Data loss + +Depending on your Kafka configuration, even after Kafka acknowledges +writes you can still experience data loss. In particular keep in mind about following properties +in Kafka config: + +- `acks` +- `log.flush.interval.messages` +- `log.flush.interval.ms` +- `log.flush.*` + +Default values for the above options can easily lead to data loss. +Please refer to the Kafka documentation for more explanation. + +### UnknownTopicOrPartitionException + +One possible cause of this error is when a new leader election is taking place, +for example after or during restarting a Kafka broker. +This is a retriable exception, so Flink job should be able to restart and resume normal operation. Review comment: Dunno, it would require some independent work to investigate it. I don't know how severe/often is that issue to weight it's priority. Probably not very frequent. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka 2.0 connector testMigrateFromAtLeastOnceToExactlyOnce failure > ------------------------------------------------------------------- > > Key: FLINK-10874 > URL: https://issues.apache.org/jira/browse/FLINK-10874 > Project: Flink > Issue Type: Bug > Components: Kafka Connector > Affects Versions: 1.8.0 > Reporter: Piotr Nowojski > Priority: Critical > Labels: pull-request-available > > https://api.travis-ci.org/v3/job/454449444/log.txt > {noformat} > Test > testMigrateFromAtLeastOnceToExactlyOnce(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase) > is running. > -------------------------------------------------------------------------------- > 16:35:07,894 WARN > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer - Property > [transaction.timeout.ms] not specified. Setting it to 3600000 ms > 16:35:07,903 INFO > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer - Starting > FlinkKafkaInternalProducer (1/1) to produce into default topic > testMigrateFromAtLeastOnceToExactlyOnce > 16:35:08,785 ERROR > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase - > -------------------------------------------------------------------------------- > Test > testMigrateFromAtLeastOnceToExactlyOnce(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase) > failed with: > java.lang.Exception: Could not complete snapshot 0 for operator MockTask > (1/1). > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:419) > at > org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshotWithLocalState(AbstractStreamOperatorTestHarness.java:505) > at > org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshot(AbstractStreamOperatorTestHarness.java:497) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.testRecoverWithChangeSemantics(FlinkKafkaProducerITCase.java:591) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.testMigrateFromAtLeastOnceToExactlyOnce(FlinkKafkaProducerITCase.java:569) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Caused by: org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: > Failed to send data to Kafka: This server does not host this topic-partition. > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:993) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.flush(FlinkKafkaProducer.java:778) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.preCommit(FlinkKafkaProducer.java:705) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.preCommit(FlinkKafkaProducer.java:94) > at > org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:291) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.snapshotState(FlinkKafkaProducer.java:783) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:395) > ... 36 more > Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: > This server does not host this topic-partition. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)