[ 
https://issues.apache.org/jira/browse/FLINK-22132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332502#comment-17332502
 ] 

Anton Kalashnikov edited comment on FLINK-22132 at 4/26/21, 3:33 PM:
---------------------------------------------------------------------

I checked the suggested scenarios. I didn't find any problem which would be 
specific for the unaligned checkpoint.

Settings:

*Cluster*: Amazon EMR (4 instances: 4 vCore, 16 GiB memory)
 *Cluster run*: ./bin/yarn-session.sh --detached
 *Job for testing*: DataStreamAllroundTestProgram and more simple 
TopSpeedWindowing.
 *Checkpoint*: unaligned
 *Job arguments*: \-\-environment.externalize_checkpoint true 
\-\-environment.parallelism 2 \-\-state_backend.checkpoint_directory 
s3://anton-flink-test/checkpoints  \-\-state_backend rocks
*Parallelism*: 1 - 9(Just in case, DataStreamAllroundTestProgram has 9 tasks so 
in max 9 * 9 = 81 subtasks)

 

A small notice from me. If hashmap use as a state backend, there are a lot of 
problems appear. For example, OOM or network issues(timeout) but it can be 
observed for both aligned and unaligned checkpoints. So again, I didn't find 
the specific unaligned checkpoint problems.


was (Author: akalashnikov):
I checked the suggested scenarios. I didn't find any problem which would be 
specific for the unaligned checkpoint.

Settings:

*Cluster*: Amazon EMR (4 instances: 4 vCore, 16 GiB memory)
 *Cluster run*: ./bin/yarn-session.sh --detached
 *Job for testing*: DataStreamAllroundTestProgram and more simple 
TopSpeedWindowing.
 *Checkpoint*: unaligned
 *Job arguments*: \-\-environment.externalize_checkpoint true 
\-\-environment.parallelism 2 \-\-state_backend.checkpoint_directory 
s3://anton-flink-test/checkpoints  \-\-state_backend rocks

 

A small notice from me. If hashmap use as a state backend, there are a lot of 
problems appear. For example, OOM or network issues(timeout) but it can be 
observed for both aligned and unaligned checkpoints. So again, I didn't find 
the specific unaligned checkpoint problems.

> Test unaligned checkpoints rescaling manually on a real cluster
> ---------------------------------------------------------------
>
>                 Key: FLINK-22132
>                 URL: https://issues.apache.org/jira/browse/FLINK-22132
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.0
>            Reporter: Piotr Nowojski
>            Assignee: Anton Kalashnikov
>            Priority: Critical
>              Labels: release-testing, test-stability
>             Fix For: 1.13.0
>
>
> To test unaligned checkpoints, we should use a few different applications 
> that use different features.
> The sinks should not be mocked but rather should be able to induce a fair 
> amount of backpressure into the system. Quite possibly, it would be a good 
> idea to have a way to add more backpressure to the sink by running the 
> respective system on the cluster and be able to add/remove parallel instances.
> The primary objective is to check if all data is recovered properly and if 
> the semantics is correct (does state match input?). 
> The secondary objective is to check if Flink UI shows the information 
> correctly.
> More details in the subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to