GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/3778

    [FLINK-5969] Add savepoint backwards compatibility tests from 1.2 to 1.3

    The binary savepoints and snapshots in the tests were created on the commit 
of the Flink 1.2.0 release, so we test backwards compatibility within the Flink 
1.2.x line. Once this is approved I'll open another PR that transplants these 
commits on the master branch (with the binary snapshots/savepoints done on 
Flink 1.2.0) so that we test migration compatibility between 1.2.0 and what is 
going to be Flink 1.3.x.
    
    I changed the naming of some existing tests so we now have 
`*From11MigrationTest` and `*From12MigrationTest` (and one ITCase). Immediately 
after releasing Flink 1.3.0 we should do the same, i.e. introduce 
`*From13MigrationTest` and ITCase based on the existing tests.
    
    The unit tests are somewhat straightforward: we feed some data into an 
operator using an operator test harness, then we do a snapshot. (This is the 
part that has to be done on the "old" version to generate the binary snapshot 
that goes into the repo). The actual tests restore an operator form that 
snapshot and verify the output.
    
    The ITCase is a bit more involved. We have a complete Job of user-functions 
and custom operators that tries to cover as many state/timer combinations as 
possible. We start the job and, using accumulators, observe the number of 
received elements in the sink. Once we get all elements we perform a savepoint 
and cancel the job. Thus we have all state caused by the elements reflected in 
our savepoint. This has to be done on the "old" version and the savepoint goes 
into the repo. The restoring job is instrumented with code that verifies 
restored state and updates accumulators. We listen on the accumulator changes 
and cancel the job once we have seen all required verifications.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink 
jira-5969-backwards-compat-12-13-on-release12

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3778
    
----
commit ef9e73a1f8af8903b0689eada2a9d853034fab88
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-20T12:48:22Z

    [FLINK-5969] Augment SavepointMigrationTestBase to catch failed jobs

commit 47143ba424355b7d25e9990bc308ea1744a0f33e
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-20T15:09:00Z

    [FLINK-5969] Add savepoint IT case that checks restore from 1.2
    
    The binary savepoints in this were created on the Flink 1.2.0 release
    commit.

commit 3803dc04caae5e57f2cb23df0b6bc4663f8af08e
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-21T09:43:53Z

    [FLINK-6353] Fix legacy user-state restore from 1.2
    
    State that was checkpointed using Checkpointed (on a user function)
    could be restored using CheckpointedRestoring when the savepoint was
    done on Flink 1.2. The reason was an overzealous check in
    AbstractUdfStreamOperator that only restores from "legacy" operator
    state using CheckpointedRestoring when the stream is a Migration stream.
    
    This removes that check but we still need to make sure to read away the
    byte that indicates whether there is legacy state, which is written when
    we're restoring from a Flink 1.1 savepoint.
    
    After this fix, the procedure for a user to migrate a user function away
    from the Checkpointed interface is this:
    
     - Perform savepoint with user function still implementing Checkpointed,
       shutdown job
     - Change user function to implement CheckpointedRestoring
     - Restore from previous savepoint, user function has to somehow move
       the state that is restored using CheckpointedRestoring to another
       type of state, .e.g operator state, using the OperatorStateStore.
     - Perform another savepoint, shutdown job
     - Remove CheckpointedRestoring interface from user function
     - Restore from the second savepoint
     - Done.
    
    If the CheckpointedRestoring interface is not removed as prescribed in
    the last steps then a future restore of a new savepoint will fail
    because Flink will try to read legacy operator state that is not there
    anymore.  The above steps also apply to Flink 1.3, when a user want's to
    move away from the Checkpointed interface.

commit f08661adcf3a64daf955ace70683ef2fe14cec2c
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-24T09:25:32Z

    [FLINK-5969] Add ContinuousFileProcessingFrom12MigrationTest
    
    The binary snapshots were created on the Flink 1.2 branch.

commit e70424eb6c9861e89c78f12143f319ce6eea49c1
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-24T10:31:53Z

    [FLINK-5969] Add OperatorSnapshotUtil
    
    This has methods for storing/reading OperatorStateHandles, as returned
    from stream operator test harnesses. This can be used to write binary
    snapshots for use in state migration tests.

commit 0217a2c3273157d4da936056fa5c76237d67b355
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-24T13:12:14Z

    [FLINK-5969] Add KafkaConsumerBaseFrom12MigrationTest
    
    The binary snapshots were created on the Flink 1.2 branch.

commit 6d3386bdb57e74ffecab76db211692aa734edf52
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-25T10:05:22Z

    [FLINK-5969] Rename StatefulUDFSavepointFrom*MigrationITCases

commit f63e52c367bf85d328b9b6b3913ffe7dbd935d11
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-24T15:13:27Z

    [FLINK-5969] Add WindowOperatorFrom12MigrationTest
    
    The binary snapshots for this were created on the Flink 1.2 branch.

commit 525f98de5a90752918c7620ffaf2490d9c540452
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-24T15:13:49Z

    [FLINK-5969] Also snapshot legacy state in operator test harness

commit 84fd38670dacf9f445f4361e85a494ad7512c3df
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2017-04-24T15:50:59Z

    [FLINK-5969] Add BucketingSinkFrom12MigrationTest
    
    The binary snapshots have been created on the Flink 1.2 branch.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to