[ 
https://issues.apache.org/jira/browse/FLINK-18115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Li closed FLINK-18115.
----------------------------
    Resolution: Done

I mainly ran the stability test developed by Ali: by simulating online abnormal 
conditions (such as network interruption, full disk, JM/AM process being 
killed, TM throwing exception, etc.) to check whether flink operation can be 
automatically recovered. The test lasted 5 hours, simulated multiple abnormal 
combination scenarios, flink job can return to normal, and the checkpoint can 
be created. The test pass

> Manually test fault-tolerance stability on Flink 1.11
> -----------------------------------------------------
>
>                 Key: FLINK-18115
>                 URL: https://issues.apache.org/jira/browse/FLINK-18115
>             Project: Flink
>          Issue Type: Sub-task
>          Components: API / Core, API / State Processor, Build System, Client 
> / Job Submission
>    Affects Versions: 1.11.0
>            Reporter: Aihua Li
>            Assignee: Aihua Li
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.11.0
>
>
> It mainly checks the flink job can recover from  various unabnormal 
> situations including disk full, network interruption, zk unable to connect, 
> rpc message timeout, etc. 
> If job can't be recoverd it means test failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to