Hi Gyula,

I'm not aware of any recent issues with the Kafka Producer. However there
was one with the Kafka Consumer which prevented the proper cancellation (
https://issues.apache.org/jira/browse/FLINK-5048).

Which version of Flink and which Kafka Producer were you using?

Cheers,
Till

On Tue, Nov 22, 2016 at 10:03 AM, Gyula Fóra <gyf...@apache.org> wrote:

> Hi,
>
> Has anyone ever experienced the Kafka producer getting stuck in cancelling?
>
> I am aware that there were problems with the Kafka consumer before but I
> haven't seen this one yet. It happened simultaneously to 3 of my jobs last
> night, they were stuck from about 8 pm to 8 am (not exact times but you get
> the length.).
>
> The logs don't seem to be very helpful on the JobManager, they just show
> that all tasks start cancelling and then go cancelled except for one Kafka
> sink task. That goes into cancelling but only gets cancelled 12 hours
> later. On one of the task managers I have found this though:
>
> 2016-11-21 20:22:52,220 INFO  org.apache.flink.yarn.YarnTaskManager
>                      - Un-registering task and sending final execution
> state CANCELED to JobManager for task Execute EventProcessors
> (f030e71787a6dbd7a543e9745c42289d)
>
> 2016-11-22 08:49:35,181 WARN  org.apache.kafka.common.network.Selector
>                      - Error in I/O with
> kafka17.sto.midasplayer.com/172.25.82.212
> java.io.EOFException
>         at org.apache.kafka.common.network.NetworkReceive.
> readFrom(NetworkReceive.java:62)
>         at org.apache.kafka.common.network.Selector.poll(
> Selector.java:248)
>         at org.apache.kafka.clients.NetworkClient.poll(
> NetworkClient.java:192)
>         at org.apache.kafka.clients.producer.internals.Sender.run(
> Sender.java:191)
>         at org.apache.kafka.clients.producer.internals.Sender.run(
> Sender.java:135)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-11-22 08:49:35,183 INFO
> org.apache.flink.runtime.taskmanager.Task                     - Sink:
> Kafka output (2/8) switched to CANCELED
>
>
> There might have been some network/kafka issue that caused 3 jobs to get
> stuck at the same time but I don't know what actually happened.
>
> Any ideas?
> Gyula
>

Reply via email to