[jira] [Commented] (FLINK-33729) Events are getting lost when an exception occurs within a processing function

Yufan Sheng (Jira) Mon, 15 Apr 2024 03:04:21 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837157#comment-17837157
 ]


Yufan Sheng commented on FLINK-33729:
-------------------------------------

[~Weijie Guo] Yep, this is a valid ticket. But it's not from the Flink side. We 
need to figure out why the transaction didn't works on Pulsar. So I think this 
ticket should be submitted to Pulsar community.

[~rtrojczak] I can't find any checkpoint configuration from your sample code. I 
can only see two line of codes that enable checkpoint. I think this is the main 
reason that your code fails as expect. Flink checkpoint needs a lot of 
configuration to use. Such as the storage. You can check the link below to get 
your application proper configured.

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#enabling-and-configuring-checkpointing

> Events are getting lost when an exception occurs within a processing function
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-33729
>                 URL: https://issues.apache.org/jira/browse/FLINK-33729
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Pulsar
>    Affects Versions: 1.15.3
>            Reporter: Rafał Trójczak
>            Priority: Major
>
> We have a Flink job using a Pulsar source that reads from an input topic, and 
> a Pulsar sink that is writing to an output topic.  Both Flink and Pulsar 
> connector are of version 1.15.3. The Pulsar version that I use is 2.10.3.
> Here is a simple project that is intended to reproduce this problem: 
> [https://github.com/trojczak/flink-pulsar-connector-problem/]
> All of my tests were done on my local Kubernetes cluster using the Flink 
> Kubernetes Operator and Pulsar is running on  my local Docker. But the same 
> problem occurred on a "normal" cluster.
> Expected behavior: When an exception is thrown within the code (or a 
> TaskManager pod is restarted for any other reason, e.g. OOM exception), the 
> processing should be picked up from the last event sent to the output topic.
> Actual behavior: The events before the failure are sent correctly to the 
> output topic, next some of the events from the input topic are missing, then 
> from some point the events are being processed normally until the next 
> exception is thrown, and so on. Finally, from 100 events that should be sent 
> from the input topic to the output topic, only 40 are sent.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33729) Events are getting lost when an exception occurs within a processing function

Reply via email to