[ 
https://issues.apache.org/jira/browse/FLINK-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695560#comment-15695560
 ] 

ASF GitHub Bot commented on FLINK-5096:
---------------------------------------

Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2845#discussion_r89597858
  
    --- Diff: 
flink-streaming-connectors/flink-connector-filesystem/src/test/java/org/apache/flink/streaming/connectors/fs/RollingSinkITCase.java
 ---
    @@ -638,6 +644,239 @@ public void flatMap(Tuple2<Integer, String> value,
                Assert.assertEquals(8, numFiles);
        }
     
    +   private static final String PART_PREFIX = "part";
    +   private static final String PENDING_SUFFIX = ".pending";
    +   private static final String IN_PROGRESS_SUFFIX = ".in-progress";
    +   private static final String VALID_LENGTH_SUFFIX = ".valid";
    +
    +   @Test
    +   public void testBucketStateTransitions() throws Exception {
    +           final File outDir = tempFolder.newFolder();
    +
    +           OneInputStreamOperatorTestHarness<String, Object> testHarness = 
createRescalingTestSink(outDir, 1, 0);
    +           testHarness.setup();
    +           testHarness.open();
    +
    +           testHarness.setProcessingTime(0L);
    +
    +           // we have a bucket size of 5 bytes, so each record will get 
its own bucket,
    +           // i.e. the bucket should roll after every record.
    +
    +           testHarness.processElement(new StreamRecord<>("test1", 1L));
    +           testHarness.processElement(new StreamRecord<>("test2", 1L));
    +           checkFs(outDir, 1, 1 ,0, 0);
    +
    +           testHarness.processElement(new StreamRecord<>("test3", 1L));
    +           checkFs(outDir, 1, 2, 0, 0);
    +
    +           testHarness.snapshot(0, 0);
    +           checkFs(outDir, 1, 2, 0, 0);
    +
    +           testHarness.notifyOfCompletedCheckpoint(0);
    +           checkFs(outDir, 1, 0, 2, 0);
    +
    +           OperatorStateHandles snapshot = testHarness.snapshot(1, 0);
    +
    +           testHarness.close();
    +           checkFs(outDir, 0, 1, 2, 0);
    +
    +           testHarness = createRescalingTestSink(outDir, 1, 0);
    +           testHarness.setup();
    +           testHarness.initializeState(snapshot);
    +           testHarness.open();
    +           checkFs(outDir, 0, 0, 3, 1);
    +
    +           snapshot = testHarness.snapshot(2, 0);
    +
    +           testHarness.processElement(new StreamRecord<>("test4", 10));
    +           checkFs(outDir, 1, 0, 3, 1);
    +
    +           testHarness = createRescalingTestSink(outDir, 1, 0);
    +           testHarness.setup();
    +           testHarness.initializeState(snapshot);
    +           testHarness.open();
    +
    +           // the in-progress file remains as we do not clean up now
    +           checkFs(outDir, 1, 0, 3, 1);
    +
    +           testHarness.close();
    +
    +           // at close it is not moved to final because it is not part
    +           // of the current task's state, it was just a not cleaned up 
leftover.
    +           checkFs(outDir, 1, 0, 3, 1);
    +   }
    +
    +   @Test
    +   public void testScalingDown() throws Exception {
    +           final File outDir = tempFolder.newFolder();
    +
    +           OneInputStreamOperatorTestHarness<String, Object> testHarness1 
= createRescalingTestSink(outDir, 3, 0);
    +           testHarness1.setup();
    +           testHarness1.open();
    +
    +           OneInputStreamOperatorTestHarness<String, Object> testHarness2 
= createRescalingTestSink(outDir, 3, 1);
    +           testHarness2.setup();
    +           testHarness2.open();
    +
    +           OneInputStreamOperatorTestHarness<String, Object> testHarness3 
= createRescalingTestSink(outDir, 3, 2);
    +           testHarness3.setup();
    +           testHarness3.open();
    +
    +           testHarness1.processElement(new StreamRecord<>("test1", 0L));
    +           checkFs(outDir, 1, 0, 0, 0);
    +
    +           testHarness2.processElement(new StreamRecord<>("test2", 0L));
    +           checkFs(outDir, 2, 0, 0, 0);
    +
    +           testHarness3.processElement(new StreamRecord<>("test3", 0L));
    +           testHarness3.processElement(new StreamRecord<>("test4", 0L));
    +           checkFs(outDir, 3, 1, 0, 0);
    +
    +           // intentionally we snapshot them in a not ascending order so 
that the states are shuffled
    +           OperatorStateHandles mergedSnapshot = 
AbstractStreamOperatorTestHarness.repackageState(
    +                   testHarness3.snapshot(0, 0),
    +                   testHarness1.snapshot(0, 0),
    +                   testHarness2.snapshot(0, 0)
    +           );
    +
    +           //with the above state reshuffling, we expect the new 
testHarness1
    +           // to take the state of the previous testHarness3 and 
testHarness2
    --- End diff --
    
    doesn't harness1 receive the state of 3 and 1?
    
    What do you think about having the testHarness2 process 5 elements? This 
way one could always accurately deduce which of the subsequent harnesses got 
which state.


> Make the RollingSink rescalable.
> --------------------------------
>
>                 Key: FLINK-5096
>                 URL: https://issues.apache.org/jira/browse/FLINK-5096
>             Project: Flink
>          Issue Type: Improvement
>          Components: filesystem-connector
>            Reporter: Kostas Kloudas
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0
>
>
> Integrate the RollingSink with the new state abstractions so that its 
> parallelism can change after restoring from a savepoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to