[ 
https://issues.apache.org/jira/browse/FLINK-11038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-11038:
-----------------------------------
    Description: 
Currently they are using {{NetworkFailuresProxy}} which is unstable both for 
Kafka 0.11 in exactly once mode (in 50% tests are live locking) and for Kafka 
2.0 (and because of that currently {{testOneToOneAtLeastOnceRegularSink}} and 
{{testOneToOneAtLeastOnceCustomOperator}} tests are disabled).

Those tests should either be rewritten to SIGKILL Flink's process doing the 
writing. Either as an ITCase SIGKILL-ing task manager or test harness 
SIGKILL-ing/exiting test harness process.

We can not simply use test harness and do not close it to simulate failure, 
because we want to make sure that we have flushed the records during 
checkpoint. If we do not SIGKILL the process, the background Kafka client's 
threads can just send those records for us.

  was:
Currently they are using {{NetworkFailuresProxy}} which is unstable both for 
Kafka 0.11 in exactly once mode (in 50% tests are live locking) and for Kafka 
2.0.

Those tests should either be rewritten to SIGKILL Flink's process doing the 
writing. Either as an ITCase SIGKILL-ing task manager or test harness 
SIGKILL-ing/exiting test harness process.

We can not simply use test harness and do not close it to simulate failure, 
because we want to make sure that we have flushed the records during 
checkpoint. If we do not SIGKILL the process, the background Kafka client's 
threads can just send those records for us.


> Rewrite Kafka at-least-once it cases
> ------------------------------------
>
>                 Key: FLINK-11038
>                 URL: https://issues.apache.org/jira/browse/FLINK-11038
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.7.0
>            Reporter: Piotr Nowojski
>            Priority: Major
>
> Currently they are using {{NetworkFailuresProxy}} which is unstable both for 
> Kafka 0.11 in exactly once mode (in 50% tests are live locking) and for Kafka 
> 2.0 (and because of that currently {{testOneToOneAtLeastOnceRegularSink}} and 
> {{testOneToOneAtLeastOnceCustomOperator}} tests are disabled).
> Those tests should either be rewritten to SIGKILL Flink's process doing the 
> writing. Either as an ITCase SIGKILL-ing task manager or test harness 
> SIGKILL-ing/exiting test harness process.
> We can not simply use test harness and do not close it to simulate failure, 
> because we want to make sure that we have flushed the records during 
> checkpoint. If we do not SIGKILL the process, the background Kafka client's 
> threads can just send those records for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to