Rafał Trójczak created FLINK-33729:
--------------------------------------
Summary: Events are getting lost when an exception occurs within a
processing function
Key: FLINK-33729
URL: https://issues.apache.org/jira/browse/FLINK-33729
Project: Flink
Issue Type: Bug
Components: Connectors / Pulsar
Affects Versions: 1.15.3
Reporter: Rafał Trójczak
We have a Flink job using a Pulsar source that reads from an input topic, and a
Pulsar sink that is writing to an output topic. Both Flink and Pulsar
connector are of version 1.15.3. The Pulsar version that I use is 2.10.3.
Here is a simple project that is intended to reproduce this problem:
[https://github.com/trojczak/flink-pulsar-connector-problem/]
All of my tests were done on my local Kubernetes cluster using the Flink
Kubernetes Operator and Pulsar is running on my local Docker. But the same
problem occurred on a "normal" cluster.
Expected behavior: When an exception is thrown within the code (or a
TaskManager pod is restarted for any other reason, e.g. OOM exception), the
processing should be picked up from the last event sent to the output topic.
Actual behavior: The events before the failure are sent correctly to the output
topic, next some of the events from the input topic are missing, then from some
point the events are being processed normally until the next exception is
thrown, and so on. Finally, from 100 events that should be sent from the input
topic to the output topic, only 40 are sent.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)