[ 
https://issues.apache.org/jira/browse/FLINK-34200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812233#comment-17812233
 ] 

Matthias Pohl commented on FLINK-34200:
---------------------------------------

I did a local test run with the following diff to check whether the failure is 
still reproducible:
{code:java}
diff --git 
a/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PrioritizedOperatorSubtaskState.java
 
b/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PrioritizedOperatorSubtaskState.java
index e41bcfe7338..676e738ff45 100644
--- 
a/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PrioritizedOperatorSubtaskState.java
+++ 
b/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PrioritizedOperatorSubtaskState.java
@@ -290,14 +290,14 @@ public class PrioritizedOperatorSubtaskState {
             }
 
             return new PrioritizedOperatorSubtaskState(
-                    computePrioritizedAlternatives(
+                    resolvePrioritizedAlternatives(
                             jobManagerState.getManagedKeyedState(),
                             managedKeyedAlternatives,
-                            KeyedStateHandle::getKeyGroupRange),
-                    computePrioritizedAlternatives(
+                            
eqStateApprover(KeyedStateHandle::getKeyGroupRange)),
+                    resolvePrioritizedAlternatives(
                             jobManagerState.getRawKeyedState(),
                             rawKeyedAlternatives,
-                            KeyedStateHandle::getKeyGroupRange),
+                            
eqStateApprover(KeyedStateHandle::getKeyGroupRange)),
                     resolvePrioritizedAlternatives(
                             jobManagerState.getManagedOperatorState(),
                             managedOperatorAlternatives, {code}
Even with the above change, the error appeared in the 2nd repetition. According 
to [~srichter] , that reveals that it must be either a test setup issue or a 
hidden issue that was just revealed by introducing the 
{{{}AutoRescalingITCase{}}}.

[~srichter] do we have someone who can look into it in more detail? I don't 
have the capacity right now.

> AutoRescalingITCase#testCheckpointRescalingInKeyedState fails
> -------------------------------------------------------------
>
>                 Key: FLINK-34200
>                 URL: https://issues.apache.org/jira/browse/FLINK-34200
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.19.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>         Attachments: FLINK-34200.failure.log.gz
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56601&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8200]
> {code:java}
> Jan 19 02:31:53 02:31:53.954 [ERROR] Tests run: 32, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 1050 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.AutoRescalingITCase
> Jan 19 02:31:53 02:31:53.954 [ERROR] 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState[backend
>  = rocksdb, buffersPerChannel = 2] -- Time elapsed: 59.10 s <<< FAILURE!
> Jan 19 02:31:53 java.lang.AssertionError: expected:<[(0,8000), (0,32000), 
> (0,48000), (0,72000), (1,78000), (1,30000), (1,54000), (0,2000), (0,10000), 
> (0,50000), (0,66000), (0,74000), (0,82000), (1,80000), (1,0), (1,16000), 
> (1,24000), (1,40000), (1,56000), (1,64000), (0,12000), (0,28000), (0,52000), 
> (0,60000), (0,68000), (0,76000), (1,18000), (1,26000), (1,34000), (1,42000), 
> (1,58000), (0,6000), (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), 
> (0,70000), (1,4000), (1,20000), (1,36000), (1,44000)]> but was:<[(0,8000), 
> (0,32000), (0,48000), (0,72000), (1,78000), (1,30000), (1,54000), (0,2000), 
> (0,10000), (0,50000), (0,66000), (0,74000), (0,82000), (1,80000), (1,0), 
> (1,16000), (1,24000), (1,40000), (1,56000), (1,64000), (0,12000), (0,28000), 
> (0,52000), (0,60000), (0,68000), (0,76000), (0,1000), (0,25000), (0,33000), 
> (0,41000), (1,18000), (1,26000), (1,34000), (1,42000), (1,58000), (0,6000), 
> (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), (0,70000), (1,4000), 
> (1,20000), (1,36000), (1,44000)]>
> Jan 19 02:31:53       at org.junit.Assert.fail(Assert.java:89)
> Jan 19 02:31:53       at org.junit.Assert.failNotEquals(Assert.java:835)
> Jan 19 02:31:53       at org.junit.Assert.assertEquals(Assert.java:120)
> Jan 19 02:31:53       at org.junit.Assert.assertEquals(Assert.java:146)
> Jan 19 02:31:53       at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingKeyedState(AutoRescalingITCase.java:296)
> Jan 19 02:31:53       at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState(AutoRescalingITCase.java:196)
> Jan 19 02:31:53       at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 19 02:31:53       at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to