[ https://issues.apache.org/jira/browse/FLINK-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931704#comment-16931704 ]
Jeffrey Martin commented on FLINK-14076: ---------------------------------------- It looks like this is due to a KafkaException getting thrown while Flink is trying to checkpoint. The KafkaException gets wrapped in a 'DeclineCheckpoint' and sent back to the JobManager. The JobManager can't deserialize it because the JobManager's classpath does not include org.apache.kafka:kafka-clients by default. Gonna dig in some more and see if I can figure out which exception is at fault. > 'ClassNotFoundException: KafkaException' on Flink v1.9 w/ checkpointing > ----------------------------------------------------------------------- > > Key: FLINK-14076 > URL: https://issues.apache.org/jira/browse/FLINK-14076 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka > Affects Versions: 1.9.0 > Reporter: Jeffrey Martin > Priority: Major > Attachments: error.txt > > > A Flink job that worked with checkpointing on a Flink v1.8.0 cluster fails on > a Flink v1.9.0 cluster with checkpointing. It works on a Flink v1.9.0 cluster > _without_ checkpointing. It is specifically _enabling checkpointing on > v1.9.0_ that causes the JM to start throwing ClassNotFoundExceptions. Full > stacktrace: [^error.txt] > The job reads from Kafka via FlinkKafkaConsumer and writes to Kafka via > FlinkKafkaProducer. > The jobmanagers and taskmanagers are standalone. > The exception is being raised deep in some Flink serialization code, so I'm > not sure how to go about stepping through this in a debugger. The issue is > happening in an internal repository at my job, but I can try to get a minimal > repro on GitHub if it's not obvious from the error message alone what's > broken. -- This message was sent by Atlassian Jira (v8.3.2#803003)