[ 
https://issues.apache.org/jira/browse/FLINK-22140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321879#comment-17321879
 ] 

Yun Gao commented on FLINK-22140:
---------------------------------

Tested the unified binary save-point with [an artificial 
job|https://github.com/gaoyunhaii/flink1.13test]:
 # The job has a source generate the tuple (index % numberOfKeys, index / 
numberOfKeys) for index in [0, numberOfRecords). The input is feed into 
operators with different type of keyed states. There are 5 types of states in 
total, namely value state, reducing state, aggregating state, list state and 
map state.
 # During the execution of the job, we execute stop with savepoint to create a 
savepoint.
 # Then we start another job with the savepoint to continue execution. The new 
job would continue to run until the expected number of records is emitted. Then 
the operators write their final state content into files, and the files would 
be compared with the expected content.

We tested the cases that 
 # Starts the first job with one statebackend in (HashMap, Rocksdb and 
Incremental Rocksdb), and start the second job with another statebackend. We 
tests all the 9 cases.
 # For the 9 cases in the first item, we start the second job with a larger 
parallelism.
 # For the 9 cases in the first item, we start the second job with a smaller 
parallelism.
 # For all the above three items, we change the key type and value type to 
customized user types.

We verified that
 # The savepoint is taken successfully, and both jobs are finished normally. 
 # The resulted statebackend content is as expected.
 # The checkpoints of the two jobs are successfully taken.
 # There is no unexpected behavior during the test process.

The test result is good for all cases with only two minor issues:
 # In Web UI we do have show the configuration of statebackend type and storage 
type. It is not easy for user to verify which statebackend is using now.
 # Not print the stack trace if the checkpoints are failed due to not all tasks 
are running: https://issues.apache.org/jira/browse/FLINK-22117, otherwise the 
log would be overwhelming with this kind of exceptions. 
  

 

> Test the unified binary savepoint
> ---------------------------------
>
>                 Key: FLINK-22140
>                 URL: https://issues.apache.org/jira/browse/FLINK-22140
>             Project: Flink
>          Issue Type: Task
>          Components: Runtime / State Backends
>    Affects Versions: 1.13.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Yun Gao
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.13.0
>
>
> With https://issues.apache.org/jira/browse/FLINK-20976 we introduced a 
> unified binary savepoint format which should let you switch between different 
> state backends. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to