libenchao commented on a change in pull request #11482: [FLINK-16581][table] Minibatch deduplication lack state TTL bug fix URL: https://github.com/apache/flink/pull/11482#discussion_r409348488
########## File path: flink-table/flink-table-runtime-blink/src/test/java/org/apache/flink/table/runtime/operators/deduplicate/MiniBatchDeduplicateKeepFirstRowFunctionTest.java ########## @@ -71,4 +71,41 @@ public void testKeepFirstRowWithGenerateRetraction() throws Exception { testHarness.close(); } + @Test + public void tesKeepFirstRowWithStateTtl() throws Exception { + MiniBatchDeduplicateKeepFirstRowFunction func = new MiniBatchDeduplicateKeepFirstRowFunction(typeSerializer, minTime.toMilliseconds()); + OneInputStreamOperatorTestHarness<BaseRow, BaseRow> testHarness = createTestHarness(func); + testHarness.setup(); + testHarness.open(); + testHarness.processElement(record("book", 1L, 12)); + testHarness.processElement(record("book", 2L, 11)); + // output is empty because bundle not trigger yet. + Assert.assertTrue(testHarness.getOutput().isEmpty()); + testHarness.processElement(record("book", 1L, 13)); + + testHarness.setStateTtlProcessingTime(30); + //Incremental cleanup is an eventual clean up, more state access guarantee more expired state cleaned + for (long i = 3; i < 30; i++) { + testHarness.processElement(record("book", i, 20)); Review comment: @Myasuka Thanks for the input. +1 to use `NeverReturnExpired` strategy. We have got other questions from user ML about this too. IMHO, `NeverReturnExpired` is more straight forward to understand for users. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services